[Snowball-discuss] New member questions

Harri Pasanen harri at mpaja.com
Sat Jan 25 09:27:07 GMT 2025


Hi Martin,

Thanks a lot for your reply.  I will certainly give Snowball a try.   My
use case is a simple one:  I'm the developer of MBraille, which is a
braille text editor app for smartphones.  MBraille has quite a lot of
functionality, the help text is getting long and is translated to many
languages.  So I'm thinking of providing a full text search for the help in
all languages and using snowball in that context.

Btw.  I ran into something I would think is a bug of one word in English:
knotty gets stemmed to knotti, spotty to spotti.  But knot is a knot and
spot is a spot.   I'm not sure if playing whack-a-mole for each word is
useful in generating an issue ticket in github is useful though.  I will
download the source and compile it myself to see if there is a test suite
or something where that could  be added,

Thanks again for your help and original work on snowball.

Best wishes
Harri






On Fri, Jan 24, 2025 at 11:05 PM Martin Porter <martin.f.porter2 at gmail.com>
wrote:

> Harri,
>
> You will have to excuse me for not being completely on top of the subject
> of snowball these days, but the main work was done a quarter of a century
> ago, and I am now 80 years of age. Papers evaluating snowball were often a
> bit negative: that is because they were testing some new system of the
> authors with a base system, and the base system would often use snowball.
> If they outperformed the base system their paper would be published, if not
> they would hold it back and try again. There was a general paper with
> comparisons of IR with/without snowball for several languages, which put
> snowball in a much better light, but I can't quite find it at the moment.
> If I do I'll let you know. But for Finnish itself this is encouraging,
>
> http://clef.isti.cnr.it/2004/working_notes/WorkingNotes2004/16.pdf
>
> with their conclusion,
>
> "For Finnish, the performance is quite high. The snowball stemmer works
> very well."
>
> (The stemmers do not reduce words to real vocabulary words incidentally,
> just to a character string that collects variant forms together.)
>
> I think the final statement on these stemming algorithms must be that they
> are a simple and inexpensive way of conflating variant forms of a word
> together, and that this can be useful in certain circumstances.
>
> Martin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/pipermail/snowball-discuss/attachments/20250125/aca26183/attachment.htm>


More information about the Snowball-discuss mailing list