[Xapian-discuss] minor problem
Kevin Duraj
kevin.softdev at gmail.com
Tue Dec 25 23:18:53 GMT 2007
Michael,
I am tracking the performance of indexing now for second year. The
indexing performance has been dramatically deteriorating. Only users
who haven't actually re indexed their data for long time as in this
example Michael, can only notice the slowness of Xapian indexing. As
you can track my postings from about year ago I used to index 20
million documents within an hour. Now I am indexing 50 million
documents in about 29 hours.
The biggest downshift in Xapian indexing performance was introducing
Flint database with compression. The second down shift in performance
was introducing locking Flint databases. However Xapian indexing is
still the fastest compare to other technologies otherwise we wouldn't
be here ...
Cheers
__________________________________
Kevin Duraj
http://UncensoredWebSearch.com
On Dec 23, 2007 8:31 PM, Michael A. Lewis <MAL at icginc.com> wrote:
> Thanks for the response Olly. My indexing code appears below. A note about the speed. It was this slow (at least to the naked eye) even when there were only a couple of hundred documents. After this code, the child process which contains this code just exits.
>
> try {
> Xapian::WritableDatabase database(dbname, DB_CREATE_OR_OPEN);
> Xapian::TermGenerator indexer;
> Xapian::Stem stemmer("english");
> indexer.set_stemmer(stemmer);
> Xapian::Document doc;
> doc.set_data(line);
> indexer.set_document(doc);
> indexer.index_text(line);
> if (meta) {
> doc.set_data(metatext);
> }
> docid=database.add_document(doc);
> sprintf( tmp1, "%lu", docid );
> x = write( c_id, tmp1, strlen(tmp1) );
> if ( x != strlen(tmp1) ) {
> log_it( "ERROR: insert could not write to socket" );
> }
> }
>
>
> ________________________________
>
> From: Olly Betts [mailto:olly at survex.com]
> Sent: Sun 12/23/2007 10:58 PM
> To: Michael A. Lewis
> Cc: xapian-discuss at lists.xapian.org
> Subject: Re: [Xapian-discuss] minor problem
>
>
>
>
> On Sun, Dec 23, 2007 at 02:38:14PM -0500, Michael A. Lewis wrote:
> > When I do a "ps -ef" command from the command line I see a task
> > belonging to my daemon that shows the command being run as "/bin/cat".
> > Looking in the xapian source code I have found that to be in the flint
> > backend locking code.
>
> The semantics of fcntl() locking within a process are rather unhelpful,
> so we fork a child process to take and hold the lock for us. To
> minimise VM use, we just exec /bin/cat once the lock is obtained.
>
> > Since I am serializing my updates (one after another) and only from a
> > single process, why am I seeing what appears to be long-term locks?
>
> The lock is held (and so the /bin/cat child process exists) for as long
> as you have the WritableDatabase open. So unless you're closing and
> reopening the database for each addition (which generally is probably
> not a good idea) then this sounds like what I'd expect.
>
> > This index code ran very fast in pre-1.0 versions of the indexer. I
> > upgraded to 1.0.0, then 1.0.1, etc. But I didn't need to index until
> > recently.
>
> It's hard to know what's going on from the information given. You said
> you're using TermGenerator, which is new in 1.0.0, so that may be
> indexing significantly differently to whatever you were using before.
> Though several seconds per document for a 10,000 document database
> really is excessively slow anyway.
>
> Could you show us what the indexing code looks like?
>
> Cheers,
> Olly
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
--
More information about the Xapian-discuss
mailing list