[Xapian-discuss] xapian vs lucene.net

Olly Betts olly at survex.com
Fri Sep 1 02:39:31 BST 2006


On Thu, Aug 31, 2006 at 03:20:16PM -0400, jarrod roberson wrote:
> On 8/31/06, James Aylett <james-xapian at tartarus.org> wrote:
> >On Wed, Aug 30, 2006 at 10:59:07PM +0100, Olly Betts wrote:
> >
> >> The report also claims Xapian leaks memory while indexing, which just
> >> isn't the case.  We've run the testsuite under valgrind for years and
> >> there are no memory leaks reported.  I also don't see unbounded growth
> >> in memory usage when indexing gmane.  We actually do relatively little
> >> explicit allocation and deallocation of memory.
> >
> >We used to leak. Can't remember when, but I believe back in 2001
> >Richard and I spent some time trying to figure out why I was getting
> >enormous memory usage in some cases. No longer the case, as far as I'm
> >aware.

Prior to 0.8.0 we used to use enormous amounts of memory when updating
by inefficiently buffering everything when we only actually need to
buffer updates to the postlist table.  Because we flushed automatically
every N documents, whenever a batch included a larger-than-seen-before
document the peak memory usage would go up.  But no memory was actually
lost - it was all referenced and eventually released.

Incidentally, I believe we could use quite a bit less memory than we
currently do, at least for the common (and performance sensitive) case
of appending new documents to a database.  This is something I'm
planning to look at.

It might also be worthwhile investigating using anonymous mmapped blocks
to buffer changes in - then we can release the memory back to the OS
once we're done with it which is hard to do with memory allocated
through the C++ heap.  Where anon mmap (or mmap of /dev/zero) isn't
available, we can just allocate from the heap as a fallback (and on MS
Windows VirtualAlloc and VirtualFree provide what we want).

> >In terms of visibility, we're not in dmoz.org (at least, in one of the
> >places Lucene is).

Feel free to submit something...

> >Lucene scores a *lot* better for Google "search
> >engine library"; we're top for "information retrieval library". That's
> >fixable by frobbing the front page in the way we've talked about, and
> >being very, very careful about phrasing :-)

Oh yes, I meant to tweak that.  I'll do it shortly.

> you ought to consider a freshmeat.net entry as well, I check that at
> least once a week for new and updated things.

We've had one for just over 3 years!

http://freshmeat.net/projects/xapian/

> get the python bindings into the cheeseshop as well.

I'm happy for people to advertise new versions in various places, but
the more things that get added to the release checklist, the less time
I have left to actually work on releases.  So unless I can *completely*
automate updating somewhere for a new release, I'd rather stick to
announcing releases here and on freshmeat myself and let others take
care of announcing elsewhere.

Cheers,
    Olly



More information about the Xapian-discuss mailing list