[Snowball-discuss] Recognise/ize stemming inconsistency?
Piers Taylor
piers-taylor at 2vu.com
Fri Nov 14 14:04:21 GMT 2008
Dear Martin,
I am working on a PHP version of your Porter2 Stemmer.
I came across the following results and then checked
the Diffs file and found them to be as per your code:
recognise recognis
recognised recognis
recognising recognis
recognition recognit
recognize recogn
recognized recogn
recognizes recogn
recognizing recogn
Since the word:
recognise == recognize
and so on, I would therefore expect them to stem
to the same thing, since indexing mixed UK/American
documents might well contain either.
Likewise, I would expect:
recognition -> recogn OR recognis
I also note similar queries with the following words:
apologise, criticise, organise, patronise, sympathise,
scrutinising, tantalising
There may be others, but these were highlighted by
some unit tests I am working on.
I understand that stemming is not a totally
exact science, and would welcome your comments
on the above observations.
With best regards,
Piers
Piers Taylor
01752 822572
07815 155301
piers-taylor at 2vu.com
More information about the Snowball-discuss
mailing list