[Snowball-discuss] Hungarian characters in hungarian/stop.txt

Olly Betts olly at survex.com
Wed Jun 11 02:09:59 BST 2014


On Tue, Jun 10, 2014 at 08:57:40PM -0400, Tom Lane wrote:
> Olly Betts <olly at survex.com> writes:
> > I've submitted a fix for the algorithm here:
> > https://github.com/snowballstem/snowball/pull/4
> 
> Thanks for the quick response!  But I think you need this in
> the new hungarian/stem_Unicode.sbl file:
> 
> -stringdef uq  hex 'FB'  //u-double acute
> +stringdef uq  hex '171' //u-double acute

Aha, thanks.  Not sure how I missed that - I did attempt to check the
other characters were the same in Latin 1 and Latin 2.

This explains why my test data updates didn't work.  I've updated the
first PR and opened one for the testdata (which is in a separate repo):

https://github.com/snowballstem/snowball-data/pull/2

Cheers,
    Olly



More information about the Snowball-discuss mailing list