[Xapian-discuss] Compressed Btrees

Olly Betts olly at survex.com
Mon Dec 13 13:59:25 GMT 2004


On Mon, Dec 13, 2004 at 02:23:00PM +0100, Arjen van der Meijden wrote:
> This is on the non-compacted database (currently I don't have a 
> compacted one):

The results would be the same anyway.

> entries: 293400883
> Totals:
> Before: 1680133099
> After:  1189099066
> Compressed by: 29.3%
> Theoretical limit (assuming uniform): 1188233055
> 
> If I understand it correctly this will be the compression on top of the 
> compaction (which only yields 8% reduction) of the position-table ?

It's not totally obvious how to translate it - this figure is just for
the change in size of the tag values.  There's also storage for the keys
and general overhead from the tree structure.  But if the tags are
shorter then they'll generally be split into fewer items inside the
Btree, which means fewer keys need to be stored.  And the less there is
in the Btree, the less overhead there is.

So you should expect the size of position_DB to decrease by somewhat
more than (1680133099 - 1189099066) bytes.  Is this the 6.3G
position_DB?  If so, I'm suprised it only has 1.6G of tags.

But assuming it is, you'd expect the filesize to go down by at least
29.3*1.6/6.3 or around 7.5%.  It will probably be substantially better
than that though.

Cheers,
    Olly



More information about the Xapian-discuss mailing list