[Xapian-discuss] Flint Backend

Sun Jun 26 09:43:32 BST 2005

Hi list,

Last friday I downloaded our "working compacted database", generated 
with 0.8.4 using no special quartzcompact-parameters but with the 
zlib-patch applied.
I copydatabase'd it to a new quartz-database to see how large that'd be 
when a non-compact database was regenerated from scratch. I ran 
quartzcompact without zlib-compressing the tables and I ran 
quartzcompact using -n -F parameters to see any gains from that.

I also copydatabase'd a flint version from it and ran xapian-compact on 
that database, and xapian-compact -n -F and -F (they had exactly the 
same result). The xapian-version I used was thunderday's 0.9.1_svn6307.

Here are the table-sizes, the original working database on our 
production machine, the quartz copy I made from the compacted version of 
that db and the flint-copy:

          Qz 0.8.4    Qz – copy   Flint
Position 8341782528  7785979904  7456931840
Postlist 4038926336  3726647296  3726647296
Record    367075328   407076864   258154496
Termlist 3506757632  3455180800  1868873728
Value      92176384    94699520   124583936

Here are some results for quartzcompact, the no-options + no-zlib,
the original compacted database with zlib and the compacted -n -F + 
zlib. Please do not that it is actually larger than the original and 
that the position table is not zlib-compressed:

          Qz          Qz 084 gz    Qz -nF gz
Position 7424589824  7424589824  7432200192
Postlist 1708957696  1428889600  1535426560
Record    254222336   178831360   179888128
Termlist 1770250240  1249050624  1395597312
Value      61317120    53313536    53313536

Here the xapian-compact results of the flint database. Here -n -F and -F 
produced exactly the same table sizes but they were smaller than the 
original compaction-try. Please do note the position-table is larger 
than in the quartz compacted-cases.

          Flint       Flint -nF/-F
Position 7452794880  7451574272
Postlist 1644240896  1634279424
Record    255377408   254418944
Termlist 1772339200  1764106240
Value      62177280    62177280

The times to generate each new database:
Qz cpy        17:35:00
Fl cpy        12:16:00
Qz cpt        00:24:00
Qz cpt -nF gz 00:46:00
Fl cpt        00:40:00
Fl cpt -nF    00:44:00

I noticed the 0.8.4 quartzcompacted database at our production machine 
was generated in 02:29:00, *much* longer than than this 0.9.1 version.
The production machine may have been loaded a bit, but the machine has 
much faster disks and much more memory...
The only thing the development machine has an advantage in, is its cpu. 
It has a 3.0Ghz P4-cpu with 1MB cache (stepping 15 and probably a 533Mhz 
fsb), while the production machine has dual-xeon 2.8Ghz with only 0.5MB 
cache (stepping 7, probably a 400Mhz fsb). And that I read the 
compacted-database instead of the working one.
So it may have lost cpu-intensive tasks (zlib-compression?) to the 
development machine, due to its lower cpu-power, but should've won 
I/O-intesive tasks (more mem, fast scsi disks in raid 0). And that'd 
indicate that the position-table should've compacted much faster, but it 
didn't (pm: 01:29:00, dm: 00:11:00 in the compact -> compact -n -F case)

Did that much change in the way quartzcompaction is done from 0.8.4 to 
0.9.1? Is reading from the working, instead of the compacted database a 
cause? Or should we really worry about the configuration of the 
production machine?

I've also taken 1000 queries from our query log and 100 queries from the 
"slow query" log. I'll run them against each database I have on the 
test-machine (which is all but the working database) and see which 
database is searched fastest. I'll post the results of that benchmark to 
the list someday this week.

Best regards,

Arjen