[Xapian-discuss] Phrase search performance

James Aylett james-xapian at tartarus.org
Wed Feb 22 12:19:36 GMT 2006


On Wed, Feb 22, 2006 at 11:34:08AM +0000, Olly Betts wrote:

> > We've recently had huge performance issues with a NAS (not quite a
> > SAN, but it's SAN-capable) because of NFS protocol versions.
> 
> Do you remember which protocol version was best?  My hazy memory is
> that v2 was faster than v3, but that may be wrong, and it may depend
> on other factors.

If you can get v4, use it - it has so many improvements it's not worth
listing them. If your backend supports it, use it. Linux 2.6 has NFS
v4 support, as will most modern NAS and SAN heads. It has all sorts of
funky stuff to reduce network overhead. In addition, because of the
way it's built, it's a lot easier to build in caching on the client
(eg: nfsv3 on Solaris my understanding is that you had to run a
separate caching file system in front of it; nfsv4 it's built in).

v2 may well be faster than v3, although it lacks some features that
probably don't matter to xapian. v4 should be better than v2 in a data
centre environment, particularly in combining multiple operations into
one TCP roundtrip. Different operations types are now built into the
one communications stream (no more separate stat, mount, ACL, NLM,
NFS).

One downside: it takes more effort to set up, because of the way
ownership and ACLs are implemented. You may have to set up mappings
for users and groups that you didn't have to with nfsv3 (because you
just synchronise the ids).

> Another thing to try is setting rsize=8192 in the mount options.  Since
> the Btree blocks are 8K (unless you override the default blocksize) and
> the default block size for NFS is often 4K, I'd expect that would help.

I'm guessing we're going to see pretty random access across blocks, so
you'll see direct seek-read for most blocks? In which case I'd say
this should make a fair difference, although you don't know what RAC
strategy the SAN head might choose. In general I'd say 4k is too small
for most NFS deployments these days anyway, but I'm biased towards
certain kinds of web serving. You can always mount the xapian stuff
separately to the rest of your data, of course.

You also want to look at how the SAN has been set up. You generally
choose different parts of the storage to be set up for different types
of task. It may be worth creating a separate area with a suitable
block size on the storage (or at least to give it hints about access
so the SAN head caches can do something useful). Although this depends
a lot on the SAN being used.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list