[Xapian-discuss] PHP Fatal error while indexing Wikipedia

Olly Betts olly at survex.com
Fri Jan 4 01:44:06 GMT 2008


On Thu, Jan 03, 2008 at 01:32:16AM +0000, Robert Young wrote:
> On Jan 3, 2008 1:08 AM, Olly Betts <olly at survex.com> wrote:
> > If you're indexing from scratch and don't have duplicate UID terms in
> > the data being indexed (which I assume is true for wikipedia dumps),
> > then your replace_document() calls are equivalent to just appending
> > with add_document() except that you keep looking up UID terms, which
> > means a std::map look-up and then a B-tree lookup.  I don't know the
> > overhead of this, but it could be fairly hefty even if the B-tree
> > blocks required are all cached.  You could try having a "rebuild" mode
> > where add_document() is called.  I'd be interested to hear how much of
> > a difference this makes.
> Well, it certainly makes a pretty big difference. It's pushed docs /
> sec up to just under 30 (about 28 - 30) from fluctuating between 15
> and 21. That puts it just a hair's breadth ahead of the same run with
> Solr (running at about 28-29/sec). If you're interested this is all
> working towards developing a search abstraction layer for PHP. I'm not
> quite sure how best to expose that in the interface but it definately
> seems worth it, thanks.

Hmm, that's quite a big difference.  Maybe there's a way to improve this
within Xapian, although I don't immediately see one.

Cheers,
    Olly



More information about the Xapian-discuss mailing list