[Snowball-discuss] New member questions
Harri Pasanen
harri at mpaja.com
Sat Jan 25 09:27:07 GMT 2025
Hi Martin,
Thanks a lot for your reply. I will certainly give Snowball a try. My
use case is a simple one: I'm the developer of MBraille, which is a
braille text editor app for smartphones. MBraille has quite a lot of
functionality, the help text is getting long and is translated to many
languages. So I'm thinking of providing a full text search for the help in
all languages and using snowball in that context.
Btw. I ran into something I would think is a bug of one word in English:
knotty gets stemmed to knotti, spotty to spotti. But knot is a knot and
spot is a spot. I'm not sure if playing whack-a-mole for each word is
useful in generating an issue ticket in github is useful though. I will
download the source and compile it myself to see if there is a test suite
or something where that could be added,
Thanks again for your help and original work on snowball.
Best wishes
Harri
On Fri, Jan 24, 2025 at 11:05 PM Martin Porter <martin.f.porter2 at gmail.com>
wrote:
> Harri,
>
> You will have to excuse me for not being completely on top of the subject
> of snowball these days, but the main work was done a quarter of a century
> ago, and I am now 80 years of age. Papers evaluating snowball were often a
> bit negative: that is because they were testing some new system of the
> authors with a base system, and the base system would often use snowball.
> If they outperformed the base system their paper would be published, if not
> they would hold it back and try again. There was a general paper with
> comparisons of IR with/without snowball for several languages, which put
> snowball in a much better light, but I can't quite find it at the moment.
> If I do I'll let you know. But for Finnish itself this is encouraging,
>
> http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf
>
> with their conclusion,
>
> "For Finnish, the performance is quite high. The snowball stemmer works
> very well."
>
> (The stemmers do not reduce words to real vocabulary words incidentally,
> just to a character string that collects variant forms together.)
>
> I think the final statement on these stemming algorithms must be that they
> are a simple and inexpensive way of conflating variant forms of a word
> together, and that this can be useful in certain circumstances.
>
> Martin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20250125/aca26183/attachment.htm>
More information about the Snowball-discuss
mailing list