[Xapian-discuss] [Xapian-devel] Dealing with image PDF's

Reini Urban rurban at x-ray.at
Thu Jul 31 12:40:13 BST 2008

2008/7/31 Richard Boulton <richard at lemurconsulting.com>:
> Reini Urban wrote:
>> 2008/7/30 Frank Bruzzaniti <frank.bruzzaniti at gmail.com>:
>>>   // Inspired by http://mjr.towers.org.uk/comp/sxw2text
>>>   string safefile = shell_protect(file);
>>>   string cmd = "tifftopnm " + safefile + " | gocr -f UTF8 -";
>>>   try {
>>>       dump = stdout_to_string(cmd);
>>>   } catch (ReadError) {
>>>       cout << "\"" << cmd << "\" failed - skipping\n";
>>>       return;
>>>   }
>> Can we finally please use configure checks for such weird helper apps,
>> to avoid runtime exceptions were the system clearly has no such app.
>> I once provided a huge patch to to do that.
>> http://thread.gmane.org/gmane.comp.search.xapian.devel/783/
> Perhaps the patch should go in a ticket; that way, we're less likely to
> forget about it.

Ticket? Uh my fault. I never though about that. Sounds useful :)
Should probably be splitted into multiple tickets, patches.

>> Applied to 1.0.5 it is attached. But there's much more in this patch
>> so some parts may be stripped. See ChangeLog.
>> TEXTCAT support for language and charset detection, cached virtual
>> directories (zip,msg,pst,...) to name a few. Works fine for me for two
>> years and I haven't touched
>> it since 0.9.6.
> Sounds useful.  However, I'm not sure that configure time is the right place
> to check for the existence of helper apps.  In particular, quite often
> omindex is installed from a pre-compiled package (for example, in Debian),
> and the helper apps present at configure time need therefore bear no
> relation to those present at runtime.
> Perhaps omindex could be improved to handle missing helper applications -
> I've not actually looked at how it handles this recently, so I don't know if
> there's actually a problem, but if there is, the correct fix seems to me to
> be to handle missing helper applications gracefully, rather than disable
> them at configure time.  Perhaps omindex would keep a cache, during each
> run, of the helper applications which have been found to be missing, so it
> would only attempt to run each one once.

I solved the preconfigured binary package problem with packaging dependencies.
I cache would be overkill.

Another advantage of such a config setting would be to hardcode the
actual helper location and don't search the whole PATH at runtime for it.
Reini Urban
http://phpwiki.org/ http://murbreak.at/

More information about the Xapian-discuss mailing list