[Xapian-discuss] Compressed Btrees

Arjen van der Meijden arjen at glas.its.tudelft.nl
Sat Dec 11 09:53:39 GMT 2004


On 9-12-2004 11:03, Olly Betts wrote:
> On Thu, Dec 09, 2004 at 10:43:39AM +0100, Arjen van der Meijden wrote:
> 
>>I'll test it with our database, using your hybrid settings, perhaps 
>>position_DB is another good candidate to run in filtered-mode?
> 
> 
> Very likely.  If it's not too long a process for your databases, you can just
> compress each table each of the 3 ways and mix and match the results by copying
> (say) record_* from one compacted directory to another.
> 
> Incidentally, quartzcompact now reports some statistics for the size reduction
> achieved for each table.

But it only reports those statistics on record_ and value_.

I'm done testing, here are the results. It took about 9-10 hours to 
compact and compress the database on an ide-disk-powered machine, I'll 
see how long it takes on the scsi-powered one tomorrow, with 0.8.3 it 
was a bit over 2:15u.

Not compressed/compacted:
total 14G
-rw-r--r--  1 acm users 7343988736 Dec  9 18:55 position_DB
-rw-r--r--  1 acm users 3636887552 Dec  9 19:03 postlist_DB
-rw-r--r--  1 acm users  335282176 Dec  9 19:04 record_DB
-rw-r--r--  1 acm users 3188170752 Dec  9 19:18 termlist_DB
-rw-r--r--  1 acm users   73367552 Dec  9 19:19 value_DB

Normally compacted (this was with 0.8.3, I didn't take the byte-size):
total 9.6G
-rw-r--r--  1 root root  6.3G Dec  9 08:09 position_DB
-rw-r--r--  1 root root  1.5G Dec  9 06:19 postlist_DB
-rw-r--r--  1 root root  228M Dec  9 06:00 record_DB
-rw-r--r--  1 root root  1.6G Dec  9 06:49 termlist_DB
-rw-r--r--  1 root root   56M Dec  9 08:09 value_DB

The compressed postions are about 6.3G, the postlists about 1.2/1.3G, 
the termlists about 1.1/1.2G, record about 160M and value about 49M.

Compacted and zlib in default mode:
total 8.8G
-rw-r--r--  1 root root  6729023488 Dec 10 04:58 position_DB
-rw-r--r--  1 root root  1298120704 Dec  9 20:53 postlist_DB
-rw-r--r--  1 root root   169009152 Dec  9 19:38 record_DB
-rw-r--r--  1 root root  1148092416 Dec  9 22:16 termlist_DB
-rw-r--r--  1 root root    50266112 Dec 10 05:03 value_DB

Compacted and zlib in filtered mode:
total 8.8G
-rw-r--r--  1 root root  6730301440 Dec 10 14:26 position_DB
-rw-r--r--  1 root root  1274216448 Dec 10 06:28 postlist_DB
-rw-r--r--  1 root root   167747584 Dec 10 05:13 record_DB
-rw-r--r--  1 root root  1177985024 Dec 10 07:50 termlist_DB
-rw-r--r--  1 root root    50610176 Dec 10 14:32 value_DB

Compacted and zlib in huffman mode:
total 8.9G
-rw-r--r--  1 root root  6736855040 Dec 11 00:21 position_DB
-rw-r--r--  1 root root  1274421248 Dec 10 16:13 postlist_DB
-rw-r--r--  1 root root   171991040 Dec 10 14:42 record_DB
-rw-r--r--  1 root root  1219551232 Dec 10 17:36 termlist_DB
-rw-r--r--  1 root root    52543488 Dec 11 00:26 value_DB

The differences in size are rather marginal. But the most compact 
results would be achieved by:
Record:   filtered
Postlist: filtered
Termlist: default
Position: default
Value:    default

However it may be more efficient to just not compress the postion-db, 
since there seems to be only a small gain for the extra cpu-power, 
rounded all four are 6.3G in size.

I didn't test with dictionaries and stuff, since I don't fully 
understand how I can fetch and create a good dictionary. (If you'd like 
to experiment with that yourself, contact me off-list Olly)

Best regards,

Arjen van der Meijden



More information about the Xapian-discuss mailing list