[Snowball-discuss] two results

Richard Boulton richard@tartarus.org
Mon Oct 6 12:50:02 2003


Boštjan Jerko wrote:
> Is it possible to get two stems for one word?
> In Slovene there is a possibility to stem word (e.g. "zelodec) in two ways ("zelodc, "zelod"c).

No, all current stemming algorithms return one stemmed version of each 
input word.

I'm not sure what it would mean to have two possible stemmed forms - 
stemming is a normalising process, used to determine if differing 
versions of a word share a common root.

Why do you think having multiple stemmed forms might be neccessary?

My thought is that a possible situation where this might be useful would 
be as follows:

We have two stemmed words, "A" and "B", with quite distinct meanings.
There exists at least one word "A_" which should stem to "A".
There exists at least one word "B_" which should stem to "B".

However, there is a word "X" which can have two different meanings.  One 
of those meanings is a form of "A", the other is a form of "B".  In 
order to reflect this, without tying "A" and "B" together, "X" should 
stem to both "A" and "B".

Is this the situation you have?

I don't know of any concrete examples of this situation, (Martin may be 
able to give one), but the way I would expect it to be solved is to 
choose which stemmed form is more frequently the correct stemmed form of 
"X", and to use that always.  Alternatively, if neither form is 
significantly more frequent, "X" could be left in an unstemmed form.

It would require a good deal of work to allow most search engines to 
deal with a stemming algorithm that returned multiple possibilities.

-- 
Richard