b***@mnogosearch.org
2014-08-06 10:47:57 UTC
Author: Oliver
Email:
Message:
Hello,
first, thank you for the great search engine you have created!
For indexing a custom site, I want to create a custom section which should contain the filename extracted from the Content-Disposition HTTP header. This header is sent for "downloadable" files and
its value might look like this:
inline; filename="comments.doc"
For this, I've added new sections in the indexer.conf file:
Section header.content-disposition 30 128
Section content_filename 31 128 cdoff "" "${header.content-disposition}" "^\w+; filename=(.+)$" "$1"
This indeed adds a "header.content-disposition" variable which I can use in the search.htm file, and which contains the entire Content-Disposition header value.
However, the content_filename section is not created correctly; it is always empty.
Through experimenting I found that ${header.content-disposition} is apparently not recognized as a variable in the "Section" command. Is there a way to access the Content-Disposition value
anyway when defining a new section? Also, is there an overview of variables available in these "Section" commands?
As workaround I now use the EREG command in search.htm to extract the filename when the results are displayed. However, this is probably less efficient (it's done whenever the results are
displayed, instead of only once during indexing). Also, it adds the entire Content-Disposition header to the index, so searching for "inline" or for "filename" finds all documents which have a Content-
Disposition header - not very desirable.
Can you give me some hints on the variables available in Section commands in indexer.conf?
Thanks,
Oliver
Reply: <http://www.mnogosearch.org/board/message.php?id=21653>
Email:
Message:
Hello,
first, thank you for the great search engine you have created!
For indexing a custom site, I want to create a custom section which should contain the filename extracted from the Content-Disposition HTTP header. This header is sent for "downloadable" files and
its value might look like this:
inline; filename="comments.doc"
For this, I've added new sections in the indexer.conf file:
Section header.content-disposition 30 128
Section content_filename 31 128 cdoff "" "${header.content-disposition}" "^\w+; filename=(.+)$" "$1"
This indeed adds a "header.content-disposition" variable which I can use in the search.htm file, and which contains the entire Content-Disposition header value.
However, the content_filename section is not created correctly; it is always empty.
Through experimenting I found that ${header.content-disposition} is apparently not recognized as a variable in the "Section" command. Is there a way to access the Content-Disposition value
anyway when defining a new section? Also, is there an overview of variables available in these "Section" commands?
As workaround I now use the EREG command in search.htm to extract the filename when the results are displayed. However, this is probably less efficient (it's done whenever the results are
displayed, instead of only once during indexing). Also, it adds the entire Content-Disposition header to the index, so searching for "inline" or for "filename" finds all documents which have a Content-
Disposition header - not very desirable.
Can you give me some hints on the variables available in Section commands in indexer.conf?
Thanks,
Oliver
Reply: <http://www.mnogosearch.org/board/message.php?id=21653>