[Xapian-discuss] Getting spelling to work

James Aylett james-xapian at tartarus.org
Tue Jan 8 21:15:05 GMT 2008

On Tue, Jan 08, 2008 at 03:55:07PM -0500, Deron Meranda wrote:

>   "The suggestions are generated dynamically from the
>      data that has been indexed,"

If you generate terms using the TermGenerator, it can add them to the
spelling dictionary automatically.

> This seems to imply that the term/postings are used as the
> basis for spelling, but in reality it looks like the spelling "index"
> is actually quite separate from the term/positing index.
> Is that true?  And why?

Yes, it's separate; you might not want it to be automatically filled
with every word generated from your corpus (for instnace if your
corpus has lots of spelling mistakes in it).

> So assume I want the spelling dictionaryto be  based upon all the
> terms in the documents (and not some predefined dictionary).

That will depend on your application, but that's a reasonable approach
to take.

> How does the spelling word "frequency" affect things?  I would
> assume that if there are multiple spelling suggestions, that the
> one with the highest frquency would be returned (as the most
> likely spelling).  This is sort of implied but not actually stated
> anyplace I can find.

Pass. Richard?

> Then, most importantly, how does one then populate the spelling
> dictionary when indexing documents?  Since every time you do
> add_spelling() the frequency is incremented; what happens if I
> want to re-index some document (or remove a document)?  For
> the terms and postings, this is a valid thing to do.  Re-indexing
> a document as many times as you want doesn't change things.
> But if you're also adding it's terms to the spellings, then re-indexing
> can seriously skew the frequencies it would seem.

Umm, no idea. Richard?


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

More information about the Xapian-discuss mailing list