[Snowball-discuss] More patches
Tolkin, Steve
Steve.Tolkin at FMR.COM
Fri Feb 16 13:05:18 GMT 2007
As you point out, some code will already change to lower case. Therefore it should not be a standard part of the stemmer. This is primarily for performance reasons. It would be "nice to have" if Snowball provided a change to lower case feature that could be optionally invoked.
Hopefully helpfully yours,
Steve
--
Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006
Fidelity Investments 82 Devonshire St. M3L Boston MA 02109
There is nothing so practical as a good theory. Comments are by me,
not Fidelity Investments, its subsidiaries or affiliates.
-----Original Message-----
From: snowball-discuss-bounces at lists.tartarus.org [mailto:snowball-discuss-bounces at lists.tartarus.org] On Behalf Of Olly Betts
Sent: Friday, February 16, 2007 7:06 AM
To: Richard Boulton
Cc: snowball-discuss at lists.tartarus.org
Subject: Re: [Snowball-discuss] More patches
[some snipped]
I wonder if the algorithms should perform lowercasing for you. In
general it's a required preprocessing step for the stemmers to work
correctly, so most users will need to implement the lower casing for
themselves (except perhaps for applications where the input is always
lowercase already).
The problem I can see is that to do it correctly for all non-ASCII
characters requires fairly large tables, and doing it just for ASCII
letters probably isn't really sufficient. Perhaps it's only necessary
for characters the stemmers check for though. Thoughts?
Cheers,
Olly
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss at lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
More information about the Snowball-discuss
mailing list