[Xapian-discuss] get_data not fast enough for query matches

Olly Betts olly at survex.com
Sun Feb 5 21:07:48 GMT 2006


On Sat, Feb 04, 2006 at 03:20:38PM +0000, Salem Berhanu wrote:
> Also I was wandering when I reindex if I should use a flint backend. I
> have read in the list that it's supposed to be faster to query. Is it
> slower to index but faster to query or fast in both cases?

It should be faster all round.

> Also is xapian-compact what I should use to merge/compact dbs 
> indexed with a flint backend.

Yes.

> I wasn't sure if it was fully implemented like quartz.

Flint isn't not finished, but everything should work (currently much of
it just uses code from quartz).

The main reason it's not the default is that database format may change
frequently and there'll be no migration path during development (except
for rebuilding your index from the source data, or alternatively using
copydatabase to copy the old-flint database to a quartz database,
upgrading, then using copydatabase to copy the quartz database to a
new-flint one).  With quartz we've generally avoided making changes
which stop a newer version reading databases created by an older
version.

The flint format will be the same in 0.9.3 as it was in 0.9.2 (I've a big
change which is mostly done which I'm holding off applying until after the
release).

> >You're forcing the matcher to avoid most of its possible optimisations
> >(which is probably why the search takes 7 seconds), and then you're
> >retrieving lots of entries from the record table, which has been
> >designed with the expectation that you'll want more like 10-1000
> >results.
> 
> I wasn't aware I was forcing the matcher to avoid its possible 
> optimisations. What I am doing that's forcing this?

Asking for all the results.  As a simple example, if you want the first
10 results in docid order, the matcher can stop after it has found 10
results.  This is also possible for multiple term queries sorted by
relevance in many cases (it's just harder to explain why).

> >I'm guessing you're only trying to get all the results so you can merge
> >the results from searching two fields in different databases, in which
> >case this ceases to be an issue if you use term prefixes instead.  If
> >I'm wrong, please explain *WHY* you want all 137480 matches.
> 
> Yeap, that's the main reason. I think also we wanted to offer users the 
> option of saving their search results but I guess we can save the matches 
> and display the data in small ranges, per page.

Saving the search has a similar effect and is perhaps more useful in
some ways - for example, saving the actual result set for a changing
document collection like the web just means that it'll contain a growing
number of dead links.  Saving the search avoids this and also shows
newer entries which match.

Cheers,
    Olly



More information about the Xapian-discuss mailing list