[Snowball-discuss] Stemming 'communing' and 'communed'
Michael Edwards
mbedwards at gmail.com
Tue Apr 3 11:06:16 BST 2007
Thanks. My implementation has been working for about a week and I
should be ready to upload it soon. One thing I noticed in the spec now
at the bottom where it lists the exceptional prefixes ('gener',
'commun', 'arsen') is that arsen is not bold and the 'a' is a ')':
"If the words begins gener, commun or (rsen, set R1 to be the
remainder of the word."
Incidentally, a colleague wrote another PHP implementation heavily
dependent on the PCRE (Perl Compatible Regular Expression) library and
it was twice as fast as mine. Even though my implementation has room
to be optimized it seems at least at first glance that regular
expressions may be the way to go for many scripting languages as far
as speed and shortness of implementation, in addition to the ability
to providing an easy porting path (because many languages implement
PCRE). Just some thoughts that might be interesting for anyone
thinking about implementing this or similar algorithms.
Best regards,
Michael
On 4/3/07, Martin Porter <martin.porter at grapeshot.co.uk> wrote:
>
> Michael,
>
> I've corrected the definition of the English stemmer in line with your comments,
>
> Martin
>
>
>
>
More information about the Snowball-discuss
mailing list