[Xapian-discuss] Search performance issues and profiling/debugging
Ron Kass
ron at pidgintech.com
Thu Oct 25 19:27:33 BST 2007
We ran the same test with databases 2,4,6,8 and 10 on a:
Dell 2950,
CentOS 5.0 64bit
Two quad-Xeon CPUs (8 cores total)
16GB mem
6 disks in raid 5.
(a nice and strong machine overall)
While speed was better in most cases, it was still long for repeated
searches.. around 0.2-0.3 seconds for sequential searches (what was
around 1 second on the other machine).
Notes:
* There is mysql server running on that machine, but its not doing much.
* Used the latest (as of today) Xapian for the SVN HEAD (on the
other server it was a prior one - 1.0.3)
* The run you see the output for, was not the first time the test
was ran.. so data should be already cached.
* Times were about 0.20 - 0.21 second (for 'no recip' search)
* Estimates still vary.
* Used latest compiled OProfile (got some warnings though when
printing output with --callgraph=12 though - see output files)
Here are the debug info for 'no recip' search:
Initial test:
http://www.pidgintech.com/other/fts/test/no_recip/test-initial.out
Summary:
First search takes 5.36sec, 71st scan takes an 12.5sec (maybe a one
time glitch). 4 other searches take about 1-2sec. Most other searches
(94 of them) take between 0.2-0.3sec. Estimates vary in the area of
7700-8300 results.
Second test: http://www.pidgintech.com/other/fts/test/
<http://www.pidgintech.com/other/fts/test/no_recip/test.out>no_recip/
<http://www.pidgintech.com/other/fts/test/no_recip/test.out>test.out
<http://www.pidgintech.com/other/fts/test/no_recip/test.out>
<http://www.pidgintech.com/other/fts/test/no_recip/test.out>
Summary:
All searches took very close to 0.2sec. Estimates vary in the area
of 7700-8300 results.
Oprofile output:
Simple: http://www.pidgintech.com/other/fts/test/no_recip/oprofile.txt
Detailed:
http://www.pidgintech.com/other/fts/test/no_recip/oprofile-d.txt
CallGraph=12:
http://www.pidgintech.com/other/fts/test/no_recip/oprofile-callgraph.txt
(Warnings seen during call to print CallGraph=12:
http://www.pidgintech.com/other/fts/test/no_recip/oprofile-callgraph.warn)
We THEN ran a search on a different term 'gle suspension':
Results were a little disturbing, as estimates varied even though they
were all below 100.. quite a low number to be inaccurate for estimates
(not that its not disturbing enough that estimates vary on same query
with same database open without database changes)..
However some notes:
* The run you see the output for, was not the first time the test
was ran.. so data should be already cached. However, the output was a
bit after the original searches, so the first search took a bit more,
for obvious reasons.
* Times were about 0.015 second for all sequential searches (for
'gle suspension' search). Difference could be related also to the
estimates... around 65 compared to around 8000 for 'no recip'
* Estimates still vary, and in the below 100 range. From 59 to 62.
* We use a: Check at least of 100!
<http://www.pidgintech.com/other/fts/test/oprofile.txt>Test:
http://www.pidgintech.com/other/fts/test/
<http://www.pidgintech.com/other/fts/test/gle/test.out>gle/
<http://www.pidgintech.com/other/fts/test/gle/test.out>test.out
<http://www.pidgintech.com/other/fts/test/gle/test.out>
Summary:
All searches took very close to 0.2sec. Estimates vary in the area
of 7700-8300 results.
Oprofile output:
Simple: http://www.pidgintech.com/other/fts/test/gle/oprofile.txt
Detailed: http://www.pidgintech.com/other/fts/test/gle/oprofile-d.txt
CallGraph=12:
http://www.pidgintech.com/other/fts/test/gle/oprofile-callgraph.txt
Following this, we've run a test printing the docIDs to see why the
variations.. Here are docIDs of the few variations..
[56297643;103475983;2238694;83603672;82696667;2041212;96212193;105779847;77279273;96496372;108118072;2403755;18899917;84542318;61840853;27258501;65134562;84167102;6266426;96834248;30386133;85952982;81182092;5224656;81838668;]
[2238694;83603672;56134222;82696667;2041212;105779847;96496372;108118072;2403755;18899917;27258501;65134562;84167102;6266426;85952982;81182092;5224656;2283529;10805044;83578837;9103611;81702717;99243007;55053292;103318422;]
[56297643;103475983;83603672;82696667;2041212;96212193;105779847;77279273;96496372;108118072;18899917;84542318;61840853;27258501;65134562;84167102;6266426;96834248;30386133;85952982;81182092;5224656;81838668;9041043;83578837;]
[56297643;103475983;2238694;83603672;82696667;2041212;96212193;105779847;77279273;96496372;108118072;18899917;84542318;61840853;27258501;65134562;84167102;6266426;96834248;30386133;85952982;81182092;5224656;81838668;2283529;]
...
the above are just 4 random samples..
Ok.. Now this looked a little disturbing..
Not only speed issue or estimates vary, search results vary!
Here is again, the list of the above 4 searches, side by side, ordered
by DocID with [MISSING] where there is no result that should have been
there according to other searches...
Search 1 Search 2 Search 3
Search 4
---------- ---------- ----------
-----------
2041212 # 2041212 # 2041212 # 2041212
2238694 # 2238694 # [MISSING] # 2238694
[MISSING] # 2283529 # [MISSING] # 2283529
2403755 # 2403755 # [MISSING] # [MISSING]
5224656 # 5224656 # 5224656 # 5224656
6266426 # 6266426 # 6266426 # 6266426
[MISSING] # [MISSING] # 9041043 # [MISSING]
[MISSING] # 9103611 # [MISSING] # [MISSING]
[MISSING] # 10805044 # [MISSING] # [MISSING]
18899917 # 18899917 # 18899917 # 18899917
27258501 # 27258501 # 27258501 # 27258501
30386133 # [MISSING] # 30386133 # 30386133
[MISSING] # 55053292 # [MISSING] # [MISSING]
[MISSING] # 56134222 # [MISSING] # [MISSING]
56297643 # [MISSING] # 56297643 # 56297643
61840853 # [MISSING] # 61840853 # 61840853
65134562 # 65134562 # 65134562 # 65134562
77279273 # [MISSING] # 77279273 # 77279273
81182092 # 81182092 # 81182092 # 81182092
[MISSING] # 81702717 # [MISSING] # [MISSING]
81838668 # [MISSING] # 81838668 # 81838668
82696667 # 82696667 # 82696667 # 82696667
[MISSING] # 83578837 # 83578837 # 83603672
83603672 # 83603672 # 83603672 # 84167102
84167102 # 84167102 # 84167102 # [MISSING]
84542318 # 85952982 # 84542318 # 84542318
85952982 # [MISSING] # 85952982 # 85952982
96212193 # [MISSING] # 96212193 # 96212193
96496372 # 96496372 # 96496372 # 96496372
96834248 # 99243007 # 96834248 # 96834248
103475983 # 103318422 # 103475983 # 103475983
105779847 # 105779847 # 105779847 # 105779847
108118072 # 108118072 # 108118072 # 108118072
Remember, this is only the first 25.
In fact, counting uniques, if the first 100 searches we see overall 50
unique DocIDs of 25 asked-for results.
So we thought.. maybe its the ordering?
Then we tried to get rid of the weight scheme we set.
And here are the results:
Total unique results in 100 searches for the top 25 results: 25 (as it
should be).
Total search times: 0.013 for all.
Estimates: 62 for all.
We THEN ran it on the original 'no recip' search, and here are the results:
Total unique results in 100 searches for the top 25 results (as it
should be).
Total search times: First: 3.5sec; Most others around 0.20-0.19 (Note
though that while overly consistent, this is still about 15 times slower
than the other search [gle suspension])
Estimates: 7571 for all.
By the way, this is the weight scheme we used.
my $k1 = 1; #governs the importance of within document
frequency. Must be >= 0. 0 means ignore wdf. Default is 1.
my $k2 = 25; #compensation factor for the high wdf values in
large documents. Must be >= 0. 0 means no compensation. Default is 0.
my $k3 = 1; #governs the importance of within query
frequency. Must be >= 0. 0 means ignore wqf. Default is 1.
my $b = 0.01; #Relative importance of within document
frequency and document length. Must be >= 0 and <= 1. Default is 0.5.
my $min_normlen = 0.5; #specifies a cutoff on the minimum value that
can be used for a normalised document length - smaller values will be
forced up to this cutoff. This prevents very small documents getting a
huge bonus weight. Default is 0.5.
$XAP_Enquire->set_weighting_scheme(Search::Xapian::BM25Weight->new($k1,$k2,$k3,$b,$min_normlen));
Maybe its important to specify here that we index the title of the
document with a higher weight (x5)
Then we went back to the old server.. Same speed as before (0.9-1.0sec
per search) and this time estimates are stable. So, weight scheme is the
cause of the inaccurate estimates. Why?
Overall this is very interesting..
1. Slow search tome for no-recip is still an issue. I am hoping the
OProfile output will shed a light on this. Looking forward to hearing
from the experts here..
2. I looked again into the docs about weight scheme and didn't see
anything about changing it or about inaccuracy of results and estimates
in such cases. Can anyone clarify?
3. Compared to the other server, 16GB mem over single SATA2 disks (five
of them, each containing 2 databases), a single quad CPU... would you
expect such a big performance difference? of 1sec vs 0.2sec for repeated
searches? Also, why is it still slow? (0.2sec for exactly the same
search that was performed a second ago, when its all the server is doing
and it should be cached.... sounds slow, isn't it?)
Anyway..
your thoughts?
Best regards,
Ron
More information about the Xapian-discuss
mailing list