[Xapian-discuss] Improving indexing speed

Olly Betts olly at survex.com
Tue Jul 1 06:23:10 BST 2008


On Thu, Jun 26, 2008 at 04:33:25PM -0700, Robert Kaye wrote:
> I am going to take this route -- I can see the disk usage creeping up  
> once it gets past 20% of my index and the rows/second starts degrading  
> past this point. Besides, dividing this task into chunks lets me  
> offload the process to multiple cores in my machines and then glue  
> things together at the end.

If you're I/O limited (which is usually the case), then trying to split
the load over multiple cores by indexing in parallel probably won't
help.  It may make things slower overall, as it will tend to increase
the VM pressure, and also tend to mean disk writes will be split between
more files.

I'd also be a bit wary of the idea of trying to use a ram disk to hold
the index.  Depending how your OS's VM system works, this might mean
you end up trying to hold two copies of the index in RAM - one in the RAM
disk, plus a cached copy in the file cache.  Or perhaps the VM system
knows about RAM disks and is smart enough not to try to cache blocks
from them, but it's something you ought to check.

Cheers,
    Olly



More information about the Xapian-discuss mailing list