[Xapian-discuss] xapian performance
Olly Betts
olly at survex.com
Wed Nov 22 23:31:35 GMT 2006
On Wed, Nov 22, 2006 at 06:55:21PM -0200, Fernando Nemec wrote:
> Do you think its better to have a large set of queries or this will do
> fine?
The effects will depend on the queries, but Arjen has already tested a
larger set so I was mostly hoping you could confirm there was no
regression for the two term case.
> This was made *without* experimental phrase optimization patch:
>
> <!--Xapian::Query(lula)-->
> 0m0.412s
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
> 1m5.062s
> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
> 1m14.193s
>
> That was made *with* phrase optimization patch:
>
> <!--Xapian::Query(lula)-->
> 0m0.379s
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
> 0m58.514s
> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
> 1m2.503s
It's interesting that the first case is sped up (by 8% which is little
high to be noise) - the patch shouldn't change non-phrase queries at
all. Is this SVN HEAD with and without this patch?
http://www.oligarchy.co.uk/xapian/patches/xapian-experimental-phrase-optimisation-v2.patch
Are you timing Omega? If so, did you try removing $topterms from your
query template?
And how are you timing?
If this is "wall-clock" time from the "time" utility/built-in, what are
the user and system times?
> I don't know if this is relevant but may be it is. On this query
>
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
>
> cache seems to do not affect this query at all. Even if I search the
> exact same query seconds later the search time is high and almost the
> same.
I think this must mean that we need to read so many disk blocks for
this query that not many end up cached. I think you said you had 1GB
of RAM, so there might not be all that much left for caching. What
does the "free" command report?
> If there's anything else I can do to help to fix this issue, please
> let me know.
It would be interesting to try measuring just how many blocks we
actually read - this will be a repeatable measure, whereas timings
from cold disk cache are much harder to exactly repeat. Try applying
this patch:
http://www.oligarchy.co.uk/xapian/patches/flint-count-read-blocks.patch
This reports the number of blocks read from each table of each flint
database to stderr (the report happens whenever a database is closed).
Cheers,
Olly
More information about the Xapian-discuss
mailing list