g***@mnogosearch.org
2012-01-19 23:42:36 UTC
Author: dsbcpas
Email: ***@dsbcpas.com
Message:
We would like to index email saved in a single file (versus mbox for example) which contain multiple file types. Specifically, we save and file emails with a .eml suffix which our pop3 email client (Thunderbird) can read. As I understand it, they are basically, a file type with nested attachments identified with mime headers.
Since mnogosearch appears to select parsers based upon file suffix type rather then mine type, it seems to be necessary to separate each type into separate files before indexing.
Any ideas how one might index each imbedded mime type and also reference the original email URL rather then the various imbedded files?
My preliminary idea: use mpack package. That package includes munpack which unpacks messages by MIME header and outputs to a separate file which is named as imbedded which normally includes the correct mime type suffix as saved. The Text part of message is a bit trickier, it can either ignore or output to files with no suffix. But I still have no idea how to pipe this into mnogosearch.
mpack and munpack is available at http://ftp.andrew.cmu.edu/pub/mpack/
All ideas welcome. The solution might be a good addition to the documentation.
Reply: <http://www.mnogosearch.org/board/message.php?id=21397>
Email: ***@dsbcpas.com
Message:
We would like to index email saved in a single file (versus mbox for example) which contain multiple file types. Specifically, we save and file emails with a .eml suffix which our pop3 email client (Thunderbird) can read. As I understand it, they are basically, a file type with nested attachments identified with mime headers.
Since mnogosearch appears to select parsers based upon file suffix type rather then mine type, it seems to be necessary to separate each type into separate files before indexing.
Any ideas how one might index each imbedded mime type and also reference the original email URL rather then the various imbedded files?
My preliminary idea: use mpack package. That package includes munpack which unpacks messages by MIME header and outputs to a separate file which is named as imbedded which normally includes the correct mime type suffix as saved. The Text part of message is a bit trickier, it can either ignore or output to files with no suffix. But I still have no idea how to pipe this into mnogosearch.
mpack and munpack is available at http://ftp.andrew.cmu.edu/pub/mpack/
All ideas welcome. The solution might be a good addition to the documentation.
Reply: <http://www.mnogosearch.org/board/message.php?id=21397>