[Xapian-discuss] Xapian and 10M (small) documents. What to expect?

Olly Betts olly at survex.com
Mon Sep 12 02:20:50 BST 2005


On Fri, Sep 09, 2005 at 07:58:02AM +0200, Arjen van der Meijden wrote:
> Real-time indexing will not allow you to use the faster-to-search 
> compacted databases. Database-compaction takes an hour or so with our 
> database. Which goes down from about 15G "working" to 11G "compacted" in 
> the Flint format.

You could search the bulk of documents in a compacted database, but also
the last day or so in a second database (searching over both using
Database::add_database()).  Then periodically merge and compact the two
databases into a new compacted database which then replaces the older
compacted database and start again with that and an empty second
database.

> In my experience that easily beats the old "remote database" in terms of 
> performance, since that used to send all result-data over the line 
> expecting the client to sort the results.
> Whether it still beats the current remote-setup I don't know, but we're 
> not just going to change a working set-up to figure that out ;)

The remote setup hasn't changed much in ages (except that sorting is
now supported).

The reason for sending back more results is that we might need them
if we're searching several databases together.  I think the remote
backend could be improved appreciably for cases where this isn't
happening (databases could still be combined on the far side of the
remote link).

Cheers,
    Olly



More information about the Xapian-discuss mailing list