[Snowball-discuss] Can regular expressions be used to implement Porter Stemmer

Martin Porter martin_porter@softhome.net
Fri Jun 6 13:39:02 2003


Of course I'm not anti-regex or anything, and the more the stemmers are used
- in whatever form - the better really. I like to think the Snowball site
can be used as a starting point for recoding the stemmers into other languages.

By PS1 do you mean the stemmer at
http://www.tartarus.org/~martin/PorterStemmer ? If so, a pascal version
would make a nice addition to the many different encodings there. Have you
tested it against the sample vocabulary?

Martin


At 11:29 06/06/2003 +0200, Sven Neumann wrote:
>I see the sense in that. I suppose my thoughts came from a misguided
>attempt of implementing the stemmer in my own language (HTAG) which
>supports REGEXs but is very bad at single character and number handling.
>
>BTW, I 1-to-1 translated the C version of PS1 to pascal (delphi) as that
>is the interpreter language of HTAG. Also, given the usual good
>compiling and optimization of the Delphi compiler I don'T think
>performance would lag behind the C version. If it is desirable I'd
>gladly submit it.
>
>Sven
>