[Snowball-discuss] Re: SnowBall German stemming

Martin Porter martin_porter@softhome.net
Wed, 13 Mar 2002 04:04:19 -0700


Marcus,

Thank you for your encouragement. You are the first German user from whom w=
e
have had any feedback!

(It is useful to post to "Snowball discuss".)

There was some confusion over codes a while ago, since I was using MS-DOS
Latin 1, but describing it as ISO-Latin 1 on the website (pure ignorance on
my part). But I think everything is in parallel now. The Snowball scripts
and the sample data sets use MS-DOS Latin 1. The documentation on the
website refers to MS-DOS Latin 1 where relevant. From 'Character codes' on
the main page you can get to the header files for ISO-Latin 1 and
instructions on how to adjust the Snowball scripts to use them. The code
values you quote are ISO-Latin 1.

- But perhaps I haven't understood your email, since '=DF' is E1=3D225 in M=
S-DOS
Latin 1, 223 in ISO-Latin 1.

Martin=20

At 02:24 AM 3/13/02 -0800, Marcus Hassler wrote:
>Hello!

>

>First of all: you did a great job! I am using the Snowball=20

>concept for developing a natural language Information=20

>Retrieval system for German.I downloaded everything and=20

>everything it is working properly. There is just one=20

>problem:

>

>The special characters '=FC' (decimal 252), '=F6' (decimal=20

>246) and '=E4' (decimal 228) are not handled as they should=20

>(as the input-output sample says!). The special character=20

>'=DF' (hex E1) is handled correctly! I am not sure if there=20

>is a problem in the snowball file with the algorithm or=20

>anything else. I would be glad if you can help me with=20

>this!

>

>Best regards,

>  Marcus



_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss