[Xapian-discuss] Size of spelling database ?

Fabrice Colin fabrice.colin at gmail.com
Sun Nov 4 10:32:28 GMT 2007


On 11/4/07, Olly Betts <olly at survex.com> wrote:
> On Sun, Nov 04, 2007 at 02:24:52PM +0800, Fabrice Colin wrote:
> > Do prefixed terms contribute to the spelling database too ? For instance,
> > terms like Tmime_type Uuri and XDIR:/directory/name etc...
>
> Only if you pass them to WritableDatabase::add_spelling().
>
> And the TermGenerator class only adds spelling entries for unprefixed
> terms (at least at the moment).
>
Okay. I use the Termgenerator.

> > What should I try to diagnose this ?
>
> If you run xapian-check on just the spelling table in question, it'll
> tell you some statistics about the table:
>
> xapian-check /path/to/spelling.DB
>
> It would also be interesting to see how much smaller it gets when run
> through xapian-compact.
>
Hmm xapian-compact complains with :
postlist ...xapian-compact: DatabaseCorruptError: Bad postlist key
I ran xapian-check on postlist.DB and got this :
baseB blocksize=8K items=925148 lastblock=24529 revision=109 levels=2 root=6
B-tree checked okay
Extra bytes after key for first chunk of posting list for term `'

The last line is printed in a seemingly infinite loop.
Xapian-check on spelling.DB prints this :
baseA blocksize=8K items=603348 lastblock=30861 revision=109 levels=2 root=437
B-tree checked okay
spelling table: Don't know how to check structure

On a more positive note, it looks like the 1.3Gb index I mentioned previously
was built with 1.0.3, and that the compression bug was responsible...
After rebuilding from scratch with 1.0.4, it shrunk down to 464Mb. The
spelling and postlist tables are 168Mb and 155Mb respectively.

Fabrice



More information about the Xapian-discuss mailing list