[Xapian-discuss] Spelling based on frequency and not just distance

Philip Neustrom philipn at gmail.com
Tue Jan 15 09:24:33 GMT 2008


Hey all,

After implementing the new spelling functionality on http://wikispot.org I
noticed that terms like "wikipeda" weren't yielding spelling suggestions.
Taking a quick look at the code, it looks like if we find an exact match,
even if it has a frequency less than another match within the provided
delta, we don't suggest anything.  This is probably fine for sites with
documents where you can be assured the data is properly spelled -- but not
suitable for something like a wiki or the web in general.

I did something simple, attached in a patch.  Maybe someone has a better
idea of how to weigh the different options, but my quick fix seemed to give
much better results than the "give up on exact or edit-distance-closest
match" code that was there already.

--Philip Neustrom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spelling_frequency.diff
Type: text/x-diff
Size: 1622 bytes
Desc: not available
Url : http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20080115/892dd069/spelling_frequency.bin


More information about the Xapian-discuss mailing list