[Xapian-discuss] docx support

Olly Betts olly at survex.com
Thu Jul 24 03:53:04 BST 2008


On Thu, Jul 24, 2008 at 02:51:26AM +0100, Olly Betts wrote:
> Rather than writing a full guide here, I'm going to write this up as a
> wiki page, since that will be easier for others to find in the future.
> I'll reply again when I'm done.

http://trac.xapian.org/wiki/FAQ/OmegaNewFileFormat

> > Is there any option/procedure to add a new mime plugin?
> > For example if you rename a docx .zip you can retrieve text from 
> > document.xml

That's quite easy to do - you should be able to heavily base the code
on that which handles OpenDocument format.  This extracts XML files
from inside a Zip format file with extension .odt or similar and then
does simple parsing to extract the document text.

Cheers,
    Olly



More information about the Xapian-discuss mailing list