[Xapian-discuss] Getting spelling to work

Deron Meranda deron.meranda at gmail.com
Tue Jan 8 20:55:07 GMT 2008


On Jan 8, 2008 2:16 PM, James Aylett <james-xapian at tartarus.org> wrote:
> Your other problem, where qp.get_corrected_query_string()
> is returning '' instead of 'database' I can't reproduce. I've attached
> a small python script which will run without assertion failures on my
> install (admittedly with Xapian HEAD not 1.0.5). If you still get an
> assertion failure, there's a problem with 1.0.5 or possibly your
> setup; if not, it's a problem with your code in some way.

Thanks James.  Your test worked, and now so does my code
as well.  I'm not sure why I was getting back '' for a while there.
Maybe something got cached someplace along the line and
restarting from a fresh python interpreter cleared it up?  Probably
my problem.

* * *

I have some other questions about spelling in Xapian though.

Is it true that words must be added to the spelling dictionary
(via add_spelling) separately from adding terms/positings?  The
spelling.html documentation says:

  "The suggestions are generated dynamically from the
     data that has been indexed,"

This seems to imply that the term/postings are used as the
basis for spelling, but in reality it looks like the spelling "index"
is actually quite separate from the term/positing index.
Is that true?  And why?

So assume I want the spelling dictionaryto be  based upon all the
terms in the documents (and not some predefined dictionary).
How does the spelling word "frequency" affect things?  I would
assume that if there are multiple spelling suggestions, that the
one with the highest frquency would be returned (as the most
likely spelling).  This is sort of implied but not actually stated
anyplace I can find.

Then, most importantly, how does one then populate the spelling
dictionary when indexing documents?  Since every time you do
add_spelling() the frequency is incremented; what happens if I
want to re-index some document (or remove a document)?  For
the terms and postings, this is a valid thing to do.  Re-indexing
a document as many times as you want doesn't change things.
But if you're also adding it's terms to the spellings, then re-indexing
can seriously skew the frequencies it would seem.

-- 
Deron Meranda



More information about the Xapian-discuss mailing list