[Snowball-discuss] sb_symbol
Nemeskey Dávid
nemeskey.david at sztaki.hu
Wed Jul 8 14:39:21 BST 2009
Hi,
I am David Nemeskey, and I've just subscribed to the list. Our company
uses the snowball stemmer in a search engine. I have two questions about
the library; I hope it's the right list.
Firstly, we use the Porter stemmer for English. However, sometimes it
does not give a good result, such as "stemming" bus to bu + PL. My
question would be if it is possible to use a dictionary with any of the
Snowball stemmers to avoid this problem.
Secondly, I have recently compiled the newest version to include the
"other" stemmers. When integrating it with our codebase, I have realized
that sb_symbol had changed from char to unsigned char (I know, it's an
old one, but up till now we used an older version).
I would be really interested in the reasons for this change. Since the C
convention is to use char*s for text, and the std::string.c_str() also
returns a const char*, this modification introduces ugly
reinterpret_casts throughout the code. I am wondering if changing it
back to char would break anything.
Thank you very much,
David Nemeskey
More information about the Snowball-discuss
mailing list