[Xapian-discuss] PHP indexing, what's the PHP method for indexscript

Olly Betts olly at survex.com
Thu Jan 17 02:14:15 GMT 2008

On Wed, Jan 16, 2008 at 09:58:11AM -0800, athlon athlonf wrote:
> >Load 5 suggests something's wrong, because dbi2omega and scriptindex
> >are both linear processes. Are you running several instances in
> >parallel in some way?
> it usually starts off fairly low, but then after half an hour of so,
> it will reach load 5 constantly.

As James says, the scriptindex process itself shouldn't raise the load
by more than 1 (since it's essentially a single process, plus one
/bin/cat child process, which will always be blocked on read except very
briefly when the database is opened or closed).

I suspect what is happening here is that the scriptindex process is
causing the machine to swap so that webserver requests take a lot longer
and so start to overlap.  Hence 4 of the load is actually due to the
webserver (although caused by scriptindex).  I can't think of another
plausible explanation anyway.

> tid : boolean=Q field=id
> pid : unique=Q boolean=Q field=pid

It doesn't seem to make a lot of sense to have two fields mapping to "Q"
like this...

FWIW, I think this may explain why your PHP script is so much faster -
"unique" is quite a slow operation (even if no duplicate documents
exist, just checking for them significantly slows indexing).  Does your
PHP indexer contain code like this:

    $db->replace_document($qterm, $doc);

If not, does it handle enforcing unique documents another way?  If it
doesn't, then you aren't comparing like with like.

If this isn't the explanation, it would be interesting to work out why
there's such a difference.


More information about the Xapian-discuss mailing list