[Xapian-discuss] Flint Backend
Arjen van der Meijden
acmmailing at tweakers.net
Sun Jun 26 09:43:32 BST 2005
Hi list,
Last friday I downloaded our "working compacted database", generated
with 0.8.4 using no special quartzcompact-parameters but with the
zlib-patch applied.
I copydatabase'd it to a new quartz-database to see how large that'd be
when a non-compact database was regenerated from scratch. I ran
quartzcompact without zlib-compressing the tables and I ran
quartzcompact using -n -F parameters to see any gains from that.
I also copydatabase'd a flint version from it and ran xapian-compact on
that database, and xapian-compact -n -F and -F (they had exactly the
same result). The xapian-version I used was thunderday's 0.9.1_svn6307.
Here are the table-sizes, the original working database on our
production machine, the quartz copy I made from the compacted version of
that db and the flint-copy:
Qz 0.8.4 Qz – copy Flint
Position 8341782528 7785979904 7456931840
Postlist 4038926336 3726647296 3726647296
Record 367075328 407076864 258154496
Termlist 3506757632 3455180800 1868873728
Value 92176384 94699520 124583936
Here are some results for quartzcompact, the no-options + no-zlib,
the original compacted database with zlib and the compacted -n -F +
zlib. Please do not that it is actually larger than the original and
that the position table is not zlib-compressed:
Qz Qz 084 gz Qz -nF gz
Position 7424589824 7424589824 7432200192
Postlist 1708957696 1428889600 1535426560
Record 254222336 178831360 179888128
Termlist 1770250240 1249050624 1395597312
Value 61317120 53313536 53313536
Here the xapian-compact results of the flint database. Here -n -F and -F
produced exactly the same table sizes but they were smaller than the
original compaction-try. Please do note the position-table is larger
than in the quartz compacted-cases.
Flint Flint -nF/-F
Position 7452794880 7451574272
Postlist 1644240896 1634279424
Record 255377408 254418944
Termlist 1772339200 1764106240
Value 62177280 62177280
The times to generate each new database:
Qz cpy 17:35:00
Fl cpy 12:16:00
Qz cpt 00:24:00
Qz cpt -nF gz 00:46:00
Fl cpt 00:40:00
Fl cpt -nF 00:44:00
I noticed the 0.8.4 quartzcompacted database at our production machine
was generated in 02:29:00, *much* longer than than this 0.9.1 version.
The production machine may have been loaded a bit, but the machine has
much faster disks and much more memory...
The only thing the development machine has an advantage in, is its cpu.
It has a 3.0Ghz P4-cpu with 1MB cache (stepping 15 and probably a 533Mhz
fsb), while the production machine has dual-xeon 2.8Ghz with only 0.5MB
cache (stepping 7, probably a 400Mhz fsb). And that I read the
compacted-database instead of the working one.
So it may have lost cpu-intensive tasks (zlib-compression?) to the
development machine, due to its lower cpu-power, but should've won
I/O-intesive tasks (more mem, fast scsi disks in raid 0). And that'd
indicate that the position-table should've compacted much faster, but it
didn't (pm: 01:29:00, dm: 00:11:00 in the compact -> compact -n -F case)
Did that much change in the way quartzcompaction is done from 0.8.4 to
0.9.1? Is reading from the working, instead of the compacted database a
cause? Or should we really worry about the configuration of the
production machine?
I've also taken 1000 queries from our query log and 100 queries from the
"slow query" log. I'll run them against each database I have on the
test-machine (which is all but the working database) and see which
database is searched fastest. I'll post the results of that benchmark to
the list someday this week.
Best regards,
Arjen
More information about the Xapian-discuss
mailing list