[Snowball-discuss] Hungarian characters in hungarian/stop.txt
Olly Betts
olly at survex.com
Wed Jun 11 02:09:59 BST 2014
On Tue, Jun 10, 2014 at 08:57:40PM -0400, Tom Lane wrote:
> Olly Betts <olly at survex.com> writes:
> > I've submitted a fix for the algorithm here:
> > https://github.com/snowballstem/snowball/pull/4
>
> Thanks for the quick response! But I think you need this in
> the new hungarian/stem_Unicode.sbl file:
>
> -stringdef uq hex 'FB' //u-double acute
> +stringdef uq hex '171' //u-double acute
Aha, thanks. Not sure how I missed that - I did attempt to check the
other characters were the same in Latin 1 and Latin 2.
This explains why my test data updates didn't work. I've updated the
first PR and opened one for the testdata (which is in a separate repo):
https://github.com/snowballstem/snowball-data/pull/2
Cheers,
Olly
More information about the Snowball-discuss
mailing list