[Xapian-discuss] postlist-errors due to already fixed bugs or not?

Arjen van der Meijden arjen@glas.its.tudelft.nl
Sat, 29 May 2004 12:42:41 +0200


On 14-5-2004 20:20, Arjen van der Meijden wrote:
> Hi list,
> 
> I discovered that our current postlist-database contains, according to 
> quartzcheck, a lot of small errors. The termslist doesn't contain errors 
> and I haven't checked the positionlist.
> 
[snip]
> 
> The "extra bytes after "... error is unique in the output, but the other 
> lines are output over a few hundred/thousand times.
> 
> This database has been created with a cvs-version of Apr 7 2004, so it 
> may have gone corrupted due to flaws in this version, which have already 
> been fixed in version 0.8.0
> The database has been created using a cvs omega/scriptindex version of 
> the same date as the xapian-library.
> 
> But my question is, can anyone assure me this has been fixed or is 
> otherwise a result of our setup and not a present bug in xapian?

I've downloaded 0.8.0 and reindexed our database with that. Due to two 
system-crashes, the index process had to be restarted twice, and the 
postlist contains errors again...

Although, now it are only termfreq's and collfreq's that are incorrect:
termfreq 169578 != # of entries 169576
collfreq 1478897 != sum wdf 1478893
termfreq 108299 != # of entries 108297
collfreq 504301 != sum wdf 504297
termfreq 26206 != # of entries 26205
collfreq 152732 != sum wdf 152731
termfreq 2994 != # of entries 2993
(and a few hundred more)

But the search engine now fails to find at least word that is known to 
be in a few thousand documents, it is only found in 3, while a word that 
is very commonly used with it, is found in over 2000...
I'm not sure whether the above is related to the two system crashes or 
other problems? How atomic are those write-batches of scriptindex?

Best regards,

Arjen van der Meijden