[Snowball-discuss] Croatian stemmer help

Tomislav Bišćan tbiscan at gmail.com
Fri Aug 1 12:46:34 BST 2008


Thanks for reply Martin.

I will try to write something for Snowball because Sphinx search engine (
http://www.sphinxsearch.com/) use it
for making stemmed indexes.

In case of problems I will need some tips like this one:
['eta' non-vowel] <- 'something'
if that is ok.

My regex means: replace word ending with 'eta' if before that is non vowel.
Replace it with everything before 'eta' (and don't cut the preceding
non-vowel) and add 'e' on the end.

For example: magareta -> magare

So if I get it good this is something like:
non-vowel ['eta'] <- 'e'
or
['eta'] non-vowel <- 'e'

Is this right or I'm wrong?

Thanks,
Tomislav

2008/8/1 Martin Porter <martin at porterloo.wanadoo.co.uk>

> it would be along the lines of,
>
> ... ['eta' non-vowel] <- 'something'
>
> to replace 'eta' and the preceding non-vowel by 'something'. (I'm sorry to
> be vague ... I've mislaid my Perl book and don't know PHP.)
>
> Snowball is unlike Perl, and you can't really do an expression to
> expression
> translation. What I suggest is that you look at one of the stemmer
> definitions and see how it's coded up in Snowball. You'll soon understand.
>
> Incidentally, if your PHP works well, there may be no advantage in
> translating into Snowball. If you wanted to submit the PHP stemmer to the
> snowball site, we'd be happy to put it up at
>
> http://snowball.tartarus.org/otherlangs/index.html
>
> (assuming BSD licensing),
>
> Martin
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20080801/fe710542/attachment.htm 


More information about the Snowball-discuss mailing list