[Xapian-discuss] Xapian with djvu files?

James Aylett james-xapian at tartarus.org
Mon Jan 14 07:25:45 GMT 2008


On Mon, Jan 14, 2008 at 05:33:38PM +1100, John Pye wrote:

> I was wondering if there was any support in Xapian for DJVU files. These
> are a nice alternative to PDF files -- much smaller file size, typically.

There isn't at the moment, but it would be fairly easy to add support
into omindex(1) to use djvutxt to convert for indexing. djvutxt uses
UTF-8 already, so something like the following in
omindex.cc:index_file() around line 308 *should* do the trick
(untested!):

----------------------------------------------------------------------
    } else if (mimetype == "application/x-djvu") {
        string cmd = "djvutxt " + shell_protect(file);
        try {
            dump = stdout_to_string(cmd);
        } catch (ReadError) {
            cout << "\"" << cmd << "\" failed - skipping\n";
            return;
        }
----------------------------------------------------------------------

You'll need to map whatever the djvu extension you use is onto
application/x-djvu with something like -M djvu:application/x-djvu as
an omindex option on the command line (or set it in omindex.cc as a
default, near line 649).

However I have to wonder why you want to - djvu is primarily an image
file format, although it has support for mixed text and images. I
admit I hadn't heard of it before now though, so perhaps the website
[1] is a little misleading about the primary use.

[1] <http://djvu.org/resources/whatisdjvu.php>

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list