[Xapian-discuss] xapian performance

Olly Betts olly at survex.com
Mon Nov 20 07:49:45 GMT 2006


On Mon, Nov 20, 2006 at 12:07:44AM +0000, Olly Betts wrote:
> Yes, it was an off-by-one error, so powers of two would get one bit more
> than actually needed.  This patch (applied on top of the previous one)
> should fix that:

Hmm, actually there's a can of worms here.

The old code is vulnerable to floating point rounding issues - it
divides log(X) by log(2.0), but log(2.0) can't be exactly represented
in binary floating point, so ceil() of the result is one higher than
it should be in a small number of cases.

Worse, those values affected vary between architectures, so flint
databases (prior to this patch) aren't quite portable.

For x86_64, the only "bad" values are 536870912 (2^29) and 2147483648
(2^31).  This is unlikely to be an issue in reality, since you're
unlikely to have term position differences that large.

For x86, the "bad" values are 2^5, 2^9, 2^10, 2^11, 2^17, 2^18, 2^19,
2^20, 2^21, 2^22, 2^23, 2^25, 2^27, 2^29, and 2^31.  This list could
conceivably vary depending on the code the compiler generates (x86 FP
registers have excess precision so storing and reloading FP values can
make a difference to the lowest few bits).  Many of these values could
be encountered in real databases.

So I think we're going to have to bump the format version of flint
databases before 1.0.  I'll have a quick look at how feasible it is
to update a database.

Cheers,
    Olly



More information about the Xapian-discuss mailing list