[Snowball-discuss] sb_symbol

Martin Porter martin at porterloo.wanadoo.co.uk
Sun Jul 12 10:37:09 BST 2009


David,

Improving the stemmers by incorporating a dictionary of exceptions is of
course possible, but you are eventually led towards a solution in which the
entire stemming process is done by dictionary lookup. As it is, the English
stemmer has various slots where exception lists can be built in, with
examples of how these lists can be built up. I'm not aware of any work with
the snowball stemmers which used the stemmers modified by a large dictionary
of exceptions, but some people might have tried this.

Actually I thought "symbol" has always been "unsigned char". In any case, I
think "unsigned char" is the correct type for 1 byte characters, and that
the casts when interfacing to the C libraries are inevitable. 


Martin







More information about the Snowball-discuss mailing list