[Xapian-discuss] Re: BUG IN XAPIAN_FLUSH_THRESHOLD

Olly Betts olly at survex.com
Sat Sep 1 20:57:20 BST 2007


On Tue, Aug 28, 2007 at 11:17:22AM +0900, Sungsoo Kim wrote:
> I have the same experience with xapian 0.9.4 that Kevin described
> before. I am sure that XAPIAN_FLUSH_THRESHOLD is not working in 0.9.4.

You ought to consider upgrading incidentally - even if you aren't ready
to migrate to 1.0.x, 0.9.10 has a number of bug fixes and a few
performance tweaks too.

> I can see my indexer stops for a while every 10,000 records to flush
> the buffer after I set XAPIAN_FLUSH_THRESHOLD environment variable to
> 100,000.

I don't have 0.9.4 around, but in SVN HEAD, setting
XAPIAN_FLUSH_THRESHOLD to 1000 makes indexing the 5000 odd documents in
/usr/share/doc with omega flush 6 times rather than just once as it does
if XAPIAN_FLUSH_THRESHOLD isn't set.

There's a bug (fixed in 0.9.7) which double-counted calls to
replace_document(docid, doc) if docid wasn't already used, but otherwise
this code hasn't changed for ages that I can see.

Pauses could be due to other factors perhaps, but a reliable indicator
of how many flushes you've had can be got by running quartzcheck on the
record table:

quartzcheck /path/to/database/record_

The "revision" reported is how many times the database has been flushed
(implicitly or explicitly).
    
My best guess is that XAPIAN_FLUSH_THRESHOLD is being misspelled, or
that it is being set correctly, but not exported.  If you're using
bash, then you need:

export XAPIAN_FLUSH_THRESHOLD=100000

If you just do 'XAPIAN_FLUSH_THRESHOLD=100000' then it is set for the
shell, but not for child processes of the shell, so your indexing
process won't see it.

Anyway, this "works for me", so if there really is a bug here, then
someone needs to diagnose it and supply either a patch or explanation
of the problem, or at least provide a way I can reproduce it...

Cheers,
    Olly



More information about the Xapian-discuss mailing list