[Xapian-discuss] Optimization and Load balancing with Xapian

David Levy dvid.levy at gmail.com
Mon Feb 20 10:05:23 GMT 2006


Hi Olly,
>>>

On 2/16/06, Olly Betts <olly at survex.com> wrote:
>
> On Thu, Feb 16, 2006 at 05:38:38PM +0200, David Levy wrote:
> > Also I ask the 5 first hits in the omega request  (HITSPERPAGE
> parameter, is
> > it the better way ?)
>
> No, that's the way to specify that.
>
> > > It's not the actual sorting which takes the extra time - the issue is
> > > that for a multi-term query, relevance ranking can terminate early in
> > > many cases (often when we reach the end of the matches for any of the
> > > terms).  But if results are sorted on a value, we need to consider
> every
> > > result which matches the query.
> >
> > so you are telling me I won't be able to improve my calculation time if
> I
> > still use sorting ...?
>
> You can try all the usual things to speed up searches - lots of RAM,
> fast disks, compact the database, etc.  Using flint instead of quartz
> may help too.  Some of the changes I have planned for flint will
> hopefully make a significant difference too - the way values are
> currently stored doesn't lead itself to fast access in this case.
>
> But sorting as currently designed does need to process every matching
> document, which is going to be slow for a large database if the query
> matches a lot of documents.



Will this mecanism change in future releases ?

I have compacted and removed large fields in the index. So the database is
half the size ... but performance are still slow.
I am thinking about using "ramdisks" maybe; and I am checking my hard disks
too.
Did you used ramdisks with Xapian yet ? Does it help ?


> Is there any other way to get results sorted by another criteria than
> > relevance ?
>
> If you have only one sort order, and can arrange to add documents in
> that order, then you can just use the raw document order for your
> sorted search.  This works particularly well for date ordering, since
> newly arrived documents end up in the right place.  That's how the
> Gmane search implements sort-by-date.


That would be a good idea, but I don't think I can because thoses values are
dynamic.

Actually, an interesting thing to note is that "sort by reverse date"
> can terminate early, while "sort by date" has to scan the whole docid
> range (I plan to allow running postlists backwards which will make
> "sort by date" as fast as "sort by reverse date" but I've not
> implemented that yet).
>
> But even now, "sort by date" is still acceptably fast on 30 million
> documents, which points the finger strongly towards accessing the values
> as taking most of the time.


How was do you mean ?
I was bad results with < 1M documents  :
 Ending search for term in 0.199603 s with 271 matches : show
Ending search for term in 0.153882 s with 1241 matches : human

without sorting.
I would really like results << 0.1 seconds for *every* query.

However, I used the "collapse" parameter .. Is it time consuming even it
there are no records to collapse in the results ?

Regards

Cheers,
>     Olly
>



--
David LEVY {selenium}
Website ~ http://www.davidlevy.org
Wishlist Zlio ~ http://david.zlio.com/wishlist
Blog ~ http://selenium.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060220/308762f8/attachment.htm


More information about the Xapian-discuss mailing list