[Xapian-discuss] Flint, UTF-8 and "large" documents

John Wang johncwang at gmail.com
Tue Oct 3 16:49:55 BST 2006


I'm currently using the 0.9.6 svn 7230 UTF-8 snapshot tarball with a Flint
backend and the Perl bindings.

When I load a certain collection, the above configuration will create an
index that seems corrupted when I go to open it. I can't find an indication
of anything going wrong while I'm building the index. When I go to open it
for reading immediately after building, I get the following:

  *** glibc detected *** free(): invalid pointer: 0x0acb6ab0 ***
  Aborted

This happens when I flush the db once after loading all the documents in the
collection. If I periodically flush while I'm loading, everything works
fine. The collection I'm loading has the following statistics:

Number of documenents: 412
Average terms per document: 233
Maximum terms per document: 1557
Total terms in collection: 96290

In this particular case, the index gets corrupted when I flush every 23
documents, but is fine if I flush every 22 documents.

The same document collection loads fine using the standard 0.9.6 without
UTF-8 (using flint) without periodic flushing. I've also loaded other
collections with more documents but of a smaller size only flushing at the
end which has been fine.

Anyone know why this is happening and what to do about it?

Thanks.

-- 
John Wang
http://www.dev411.com/blog/


More information about the Xapian-discuss mailing list