[Snowball-discuss] Re: SnowBall German stemming
Martin Porter
martin_porter@softhome.net
Wed, 13 Mar 2002 04:04:19 -0700
Marcus,
Thank you for your encouragement. You are the first German user from whom w=
e
have had any feedback!
(It is useful to post to "Snowball discuss".)
There was some confusion over codes a while ago, since I was using MS-DOS
Latin 1, but describing it as ISO-Latin 1 on the website (pure ignorance on
my part). But I think everything is in parallel now. The Snowball scripts
and the sample data sets use MS-DOS Latin 1. The documentation on the
website refers to MS-DOS Latin 1 where relevant. From 'Character codes' on
the main page you can get to the header files for ISO-Latin 1 and
instructions on how to adjust the Snowball scripts to use them. The code
values you quote are ISO-Latin 1.
- But perhaps I haven't understood your email, since '=DF' is E1=3D225 in M=
S-DOS
Latin 1, 223 in ISO-Latin 1.
Martin=20
At 02:24 AM 3/13/02 -0800, Marcus Hassler wrote:
>Hello!
>
>First of all: you did a great job! I am using the Snowball=20
>concept for developing a natural language Information=20
>Retrieval system for German.I downloaded everything and=20
>everything it is working properly. There is just one=20
>problem:
>
>The special characters '=FC' (decimal 252), '=F6' (decimal=20
>246) and '=E4' (decimal 228) are not handled as they should=20
>(as the input-output sample says!). The special character=20
>'=DF' (hex E1) is handled correctly! I am not sure if there=20
>is a problem in the snowball file with the algorithm or=20
>anything else. I would be glad if you can help me with=20
>this!
>
>Best regards,
> Marcus
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss