[Snowball-discuss] sb_symbol

Nemeskey Dávid nemeskey.david at sztaki.hu
Wed Jul 8 14:39:21 BST 2009


Hi,

I am David Nemeskey, and I've just subscribed to the list. Our company
uses the snowball stemmer in a search engine. I have two questions about
the library; I hope it's the right list.

Firstly, we use the Porter stemmer for English. However, sometimes it
does not give a good result, such as "stemming" bus to bu + PL. My
question would be if it is possible to use a dictionary with any of the
Snowball stemmers to avoid this problem.

Secondly, I have recently compiled the newest version to include the
"other" stemmers. When integrating it with our codebase, I have realized
that sb_symbol had changed from char to unsigned char (I know, it's an
old one, but up till now we used an older version).

I would be really interested in the reasons for this change. Since the C
convention is to use char*s for text, and the std::string.c_str() also
returns a const char*, this modification introduces ugly
reinterpret_casts throughout the code. I am wondering if changing it
back to char would break anything.

Thank you very much,
David Nemeskey




More information about the Snowball-discuss mailing list