[Snowball-discuss] A problem with replacing letters

A. Tordai atordai at science.uva.nl
Thu Jan 20 16:40:46 GMT 2005


Hello,

I'm working on a Hungarian stemmer and I have a problem I haven't been 
able to solve. The code is added below.  I have a routine called 
v_ending which replaces  "a acute" and "e acute" by "a" and "e".  If I 
simply delete them it works but when I actually try replacing instead of 
an "a" I get an "a acute".
For instance if I test it on the word  "hagyásában" I ought to get 
"hagyása" (with ban removed and a acute replaced) but I get "hagyásá". 
Similar things happen with a word like  "kimenetelében".  I suspect I am 
missing something simple but I just can't figure out what goes wrong.

Thank you

Anna Tordai

**************************

// Hungarian stemmer.

routines (
           mark_regions
       v_ending
       R1
       R2
           case
)

externals ( stem )

integers ( p1 p2 )
groupings ( v )

stringescapes {}

/* special characters (in ISO Latin I) */

stringdef a'    hex 'E1'     // a-acute
stringdef e'    hex 'E9'    //e-acute   
stringdef i'    hex 'ED'    //i-acute
stringdef o'    hex 'F3'    //o-acute
stringdef o"    hex 'F6'    //o-umlaut
stringdef oq    hex 'F5'    //o-double acute 
stringdef u'    hex 'FA'    //u-acute
stringdef u"    hex 'FC'    //u-umlaut
stringdef uq    hex 'FB'    //u-double acute


//vowels
define v 'aeiou{a'}{e'}{i'}{o'}{o"}{oq}{u'}{u"}{uq}'


define mark_regions as (

    $p1 = limit
    $p2 = limit

    (gopast v  (test substring among('cs' 'gy' 'sz' 'ty') setmark p1)) or
    (goto v gopast non-v setmark p1)
    goto v  gopast non-v  setmark p2
)

backwardmode (

    define R1 as $p1 <= cursor   
    define R2 as $p2 <= cursor
   
    define v_ending as (
        [substring] among(
        '{a'}' (<- 'a')
        '{e'}' (<- 'e')
    )
    )

    define case as (
      [substring] among(
            'ban'     //inessive           
        'ben'    //inessive
        )
     delete
    v_ending
    )
)

define stem as (
    do mark_regions
    backwards (
        do case
    )
)




More information about the Snowball-discuss mailing list