[Snowball-discuss] Recognise/ize stemming inconsistency?

Piers Taylor piers-taylor at 2vu.com
Fri Nov 14 14:04:21 GMT 2008


Dear Martin,

I am working on a PHP version of your Porter2 Stemmer.

I came across the following results and then checked
the Diffs file and found them to be as per your code:

	recognise	recognis
	recognised	recognis
	recognising	recognis
	recognition	recognit
	recognize	recogn
	recognized	recogn
	recognizes	recogn
	recognizing	recogn

Since the word:

	recognise == recognize

and so on, I would therefore expect them to stem
to the same thing, since indexing mixed UK/American
documents might well contain either.

Likewise, I would expect:

	recognition -> recogn OR recognis

I also note similar queries with the following words:

	apologise, criticise, organise, patronise, sympathise,
	scrutinising, tantalising

There may be others, but these were highlighted by
some unit tests I am working on.

I understand that stemming is not a totally
exact science, and would welcome your comments
on the above observations.

With best regards,
		   Piers

Piers Taylor
01752 822572
07815 155301
piers-taylor at 2vu.com





More information about the Snowball-discuss mailing list