[Xapian-discuss] xapian performance

Fernando Nemec fernando.nemec at folha.com.br
Wed Nov 22 21:49:59 GMT 2006


Hi Olly,

I don't know if this is relevant but may be it is. On this query

<!--Xapian::Query((presidente PHRASE 2 lula))-->

cache seems to do not affect this query at all. Even if I search the
exact same query seconds later the search time is high and almost the
same.

This behavior doesn't happen with this query

<!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->

When I search the same query seconds later the search time is greatly
reduced (just a few seconds).

Both queries return almost 190000 documents, in a database with
1050000 documents.

Thanks again,

Nemec




Wednesday, November 22, 2006, 6:55:21 PM, you wrote:

> Hi Olly,

>> Could you compare the speed of phrase searches with this patch:

> Certainly. I use Query::get_description for each query I did along the
> time to get the result set. I just made three different queries: one
> term, 2 words phrase and 6 words phrase.

> Do you think its better to have a large set of queries or this will do
> fine?

> This was made *without* experimental phrase optimization patch:

> <!--Xapian::Query(lula)-->
> 0m0.412s
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
> 1m5.062s
> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
> 1m14.193s

> That was made *with* phrase optimization patch:

> <!--Xapian::Query(lula)-->
> 0m0.379s
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
> 0m58.514s
> <!--Xapian::Query((governo PHRASE 6 do PHRASE 6 estado PHRASE 6 de PHRASE 6 sao PHRASE 6 paulo))-->
> 1m2.503s

> Thanks for you help Olly. If there's anything else I can do to help to
> fix this issue, please let me know.

> Nemec





> Wednesday, November 22, 2006, 5:19:45 PM, you wrote:

>> On Tue, Nov 21, 2006 at 07:16:52PM -0200, Fernando Nemec wrote:
>>> After so many patches I opt to get a fresh new source copy from svn.
>>> As far as I see you committed almost all patches you produced in the
>>> last days.

>> So far I've only committed the changes to use "my_fls" instead of the
>> floating point log calculation.  The changes to open positionlists
>> lazily aren't in yet (I was waiting to check that the latest patch
>> fixed the slowdown for 2 term phrases).

>>> Sadly I didn't figure out any new improvement. I made a simple list
>>> with a variety of queries and all of them return in more or less the
>>> same time (a few tens of seconds).

>> The "my_fls" changes should reduce CPU use, so you won't see much
>> improvement if you're heavily I/O bound (which you must be if a search
>> takes tens of seconds).

>>> Is there any information I can supply to you to help to find what's
>>> going on phrase searches?

>> Could you compare the speed of phrase searches with this patch:

>>> > http://www.oligarchy.co.uk/xapian/patches/xapian-experimental-phrase-optimisation-v2.patch

>> with not using it (either on SVN trunk or 0.9.9).  Ideally it should
>> speed up phrases with 3 or more terms, but should be just as fast for
>> 2 term phrases.

>> I'm going to look at creating a simple patch to count the number of
>> blocks read from each table during the query, which should help to get a
>> handle on how much I/O we're actually doing in an easily repeatable way.

>> Cheers,
>>     Olly

> --
> []s
> Fernando Nemec
> fernando.nemec at folha.com.br
> http://www.folha.com.br/


--
[]s
Fernando Nemec
fernando.nemec at folha.com.br
http://www.folha.com.br/





More information about the Xapian-discuss mailing list