[Xapian-discuss] xapian performance

Olly Betts olly at survex.com
Fri Dec 1 00:02:51 GMT 2006


On Thu, Nov 23, 2006 at 12:25:27PM -0200, Fernando Nemec wrote:
> <!--Xapian::Query(lula)-->
> 1 blocks read from /local/xapian/newdb/record.
> 4369 blocks read from /local/xapian/newdb/value.

Hmm, I don't think you mentioned you were using values.  That adds to the
number of blocks which we need to look at, but also if you're sorting on
a value there are some matcher optimisations which can't be used so the
matcher will generally need to consider more documents anyway.

>              total       used       free     shared    buffers     cached
> Mem:       1034764    1019508      15256          0       3556     980372

So it looks like we're getting a lot of the 1GB being used as disk
cache, which is good.

> == CASE 2
> <!--Xapian::Query((presidente PHRASE 2 lula))-->
> 1 blocks read from /local/xapian/newdb/record.
> 3023 blocks read from /local/xapian/newdb/value.
> 3 blocks read from /local/xapian/newdb/termlist.
> 153036 blocks read from /local/xapian/newdb/position.
> 380 blocks read from /local/xapian/newdb/postlist.

But if you do the sums here: blocks are 8K by default, and we're reading
156443 of them, which is 1.19GB of data, or about 265MB more than we can
have cached (actually some blocks may be read more than once in the
above counts, so this probably a slight over-estimate).

So depending on initial cache state, we need to read between 265MB and
1.19GB of data from disk, with some seeking around between reads.

A quick tests shows my dev box can read a total 1.4GB of data from 3
(probably mostly sequential) uncached files on a SATA2 disk in 37
seconds, so if the disk heads have to seek around a bit, I can see
why this query is slow.

Short term, more RAM will help a lot as then you'll be able to have most
of the skeleton of the position list Btree permanently cached.  And (if
you don't have one already) a fast RAID disk setup will help reduce the
cost of disk cache misses.

The new B-tree manager should also improve this once I have it ready to
merge in, as it will reduce the number of non-leaf blocks needed to
store a give table so there's more chance we'll have the required branch
blocks in cache.  The reduction should be particularly good for the
position list table.

There may also be other easy gains remaining.  I'll see if I can think
of anything.

Cheers,
    Olly



More information about the Xapian-discuss mailing list