[Xapian-discuss] Compressed Btrees

Arjen van der Meijden arjen at glas.its.tudelft.nl
Mon Dec 13 13:23:00 GMT 2004


On 12-12-2004 23:44, Olly Betts wrote:
> On Sun, Dec 12, 2004 at 07:55:32PM +0000, Olly Betts wrote:
> 
>>Interpolative coding will work much better for the positions anyway.
>>I've written code to calculate the compression achievable for a given
>>position list, though I've not written actual compression and
>>decompression code yet.  I'll try to sort out something to let you see
>>how well they'll compress at least.
> 
> 
> OK, stick the attached file in your xapian-core tree instead of
> bin/quartzdump.cc and rebuild quartzdump (this is an easy way to get
> the right files included and linked!)
> 
> Then run it with the database directory as the argument.
> 
> It'll report the current total tag size and what the new total tag size
> would be.  I get just over 30% compression in most cases.
> 
> It also reports a theoretical max compression, which we usually actually
> exceed by a little because one of the assumptions in the theoretical
> value isn't correct.

This is on the non-compacted database (currently I don't have a 
compacted one):

entries: 293400883
Totals:
Before: 1680133099
After:  1189099066
Compressed by: 29.3%
Theoretical limit (assuming uniform): 1188233055

If I understand it correctly this will be the compression on top of the 
compaction (which only yields 8% reduction) of the position-table ?

Best regards,

Arjen van der Meijden



More information about the Xapian-discuss mailing list