[Xapian-discuss] Indexing PDF, DOC etc.
Jim
jim at fayettedigital.com
Thu Nov 6 12:05:23 GMT 2008
Florian Beer wrote:
> I'm trying to index PDFs that are stored in a MySQL database (blob
> field) using omindex now.
> What's the exact call to tell omindex to index a byte stream (passed
> directly from my Python programm) instead of specifying a directory on
> the commandline?
>
> Is this even possible, or would I have to first write the PDF data out
> from the MySQL to a temporary file, let it index (supplying arbitrary
> metadata) and then delete the temp file?
>
>
>
From the man page:
omindex - Index static website data via the filesystem
Omindex reads a directory hierarchy of files which represent the data
accessible via a browser. It's not the tool that you will want to use
to index PDF files from within a MySQL database.
Scriptindex may be something that you could use. It processed a file at
a time. The other option is to use the Python Xapian package to
programmatically generate an index.
Just curious, once you have the index and are searching, what mechanism
are you using to retrieve the documents? E. g. Do you have a web page
that allows you to pass a document id that then retrieves the data from
the database?
Jim.
More information about the Xapian-discuss
mailing list