[Xapian-discuss] Re: Evaluating Xapian

Richard Boulton richard at tartarus.org
Mon Jan 31 13:44:42 GMT 2005


On Fri, 2005-01-28 at 20:56 +0100, Arne Georg Gleditsch wrote:
> Well, I'm fiddling with using Xapian for a source-code indexing system
> where I want to index several releases of the same source code base
> (the Linux kernel, primarily).

As a side point - you might want to take a look at the "cvssearch"
application in "xapian-applications/cvssearch", which is aiming at a
somewhat similar task.  I'm not sure exactly what state it is in - Olly
has been gradually bringing it up to scratch as a Xapian application.

> Where the same file exists in several
> releases in an identical revision (which is true for a lot of files,
> especially in a stable branch), I'd like to index this [file,revision]
> only once.  So I'm tagging the indexed documents with the releases
> they occur in, incrementally adding tags as I index new releases.
> It's not a performance-critical part of the system, but it seems to be
> slower than it needs to be.  I get the impression that it's actually
> slower than indexing a clean tree.  (I will try to do a more useful
> performance study, I'm just trying to eliminate stupid usage pattern
> errors here.)  Does replace_document cause an implicit flush of the
> database?

replace_document can cause an implicit flush of the database (but won't
always). Specifically, if the document being modified was added or
modified in the currently buffered batch, the database is flushed.  This
is because it's fiddly to handle this case, and for most usage patterns
it's a fairly uncommon operation.

In the short term, it might be worth your while to try and avoid this
type of access, perhaps by changing the order in which you index the
documents. 

In the longer term, perhaps it would be worthwhile for us to try and
remove this constraint.

-- 
Richard Boulton <richard at tartarus.org>




More information about the Xapian-discuss mailing list