[Snowball-discuss] Quasi-infinite recursion in Turkish stemmer

Martin Porter martin.f.porter at gmail.com
Thu Aug 25 11:29:25 BST 2022


Tom,

It may be useful to look at this problem in a bit more in the
historical context of how the various stemmers came about. In snowball
there were the internally developed stemmers (actually developed by
me), written and tested according to the standards suggested on the
snowball site. In addition there were also contributed stemmers, put
on the site out of interest to others, which we were not in a position
to evaluate through lack of knowledge of the languages they covered.
We expected queries about and maintenance of this secondary group to
be directed to the originators of the work, but inevitably links to
the originators decayed over time.

Turkish was a contributed stemmer, and one about which I had
misgivings, never resolved. If you look at the 2-column result of a
stemmer, for example

http://snowball.tartarus.org/algorithms/french/diffs.txt

(nowadays, you may need to do firefox's "view->repair text encoding"
or equivalent) you see the first column with the usual ragged right
edge and the second column where the words repeat in neat blocks. This
is what you want, but the Turkish stemmer did not seem to do it. And
also, the stemmer seemed to be very long. I did once look at it with
an IR specialist whose first language was Turkish, but the meeting was
not fruitful.

I hope we can get Olly Betts' opinion on all this.

Anyway, the question arises, quite apart from the stack-overflow
issue, are you finding any benefits in using the Turkish stemmer?

Martin



More information about the Snowball-discuss mailing list