[Xapian-discuss] best method for stemming
Dobrica Pavlinusic
dpavlin at rot13.org
Wed Feb 8 18:38:34 GMT 2006
On Wed, Feb 08, 2006 at 12:18:28PM +0000, Olly Betts wrote:
> Alternatively, you could stem nothing at index time and then for search
> terms which you want to stem, stem them, and then run them through an
> "unstemming" algorithm to produce a list of terms they could have come
> from. Then OR this list together. Unfortunately nobody has written
> the "unstemmer" yet. Also this means more work at search time than
> the first approach, but that may not really matter. I've not tried
> the idea, so I can't say for sure.
I have written a module that produces alternative spellings from ispell
data files in perl. It's available at
http://search.cpan.org/~dpavlin/Lingua-Spelling-Alternative/
I mainly use it to index Croatian, where we don't have a stemmer. I
store all words and than expand query to all variants to catch them.
Croatian is very irregular, but this works very well for me.
--
Dobrica Pavlinusic 2share!2flame dpavlin at rot13.org
Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
More information about the Xapian-discuss
mailing list