[Xapian-discuss] docx support

Frank Bruzzaniti frank.bruzzaniti at gmail.com
Thu Jul 24 15:46:21 BST 2008


Yay it works.

I added

mime_map["docx"] = 
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"; 
//Word 2007

// Start: Word 2007 .docx
} else if (startswith(mimetype, 
"application/vnd.openxmlformats-officedocument.wordprocessingml."))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
string cmd = "unzip -p " + safefile + " word/document.xml";
try {
XmlParser xmlparser;
xmlparser.parse_html(stdout_to_string(cmd));
dump = xmlparser.dump;
} catch (ReadError) {
cout << "\"" << cmd << "\" failed - skipping\n";
return;
}
// End: Word 2007 .docx



Olly Betts wrote:
> On Thu, Jul 24, 2008 at 11:12:31PM +0930, Frank Bruzzaniti wrote:
>   
>> I added a mime type in omindex.cc but when I run it I get this:
>>
>> Indexing "/Test.docx" as 
>> application/vnd.openxmlformats-officedocument.wordprocessingml.document 
>> ... unknown MIME type - skipping
>>
>> what other source files do I need to look at?
>>     
>
> None - this is all omindex.cc.
>
> It sounds like you've added it to mime_map so that .docx is converted to
> that mime-type, but not added an "else if" case to actually handle the
> new mime-type.  The new FAQ entry covers that too.
>
> If that's not it, send a patch of your change (diff -u format).
>
> Cheers,
>     Olly
>   


More information about the Xapian-discuss mailing list