[Snowball-discuss] A problem with replacing letters
A. Tordai
atordai at science.uva.nl
Thu Jan 20 16:40:46 GMT 2005
Hello,
I'm working on a Hungarian stemmer and I have a problem I haven't been
able to solve. The code is added below. I have a routine called
v_ending which replaces "a acute" and "e acute" by "a" and "e". If I
simply delete them it works but when I actually try replacing instead of
an "a" I get an "a acute".
For instance if I test it on the word "hagyásában" I ought to get
"hagyása" (with ban removed and a acute replaced) but I get "hagyásá".
Similar things happen with a word like "kimenetelében". I suspect I am
missing something simple but I just can't figure out what goes wrong.
Thank you
Anna Tordai
**************************
// Hungarian stemmer.
routines (
mark_regions
v_ending
R1
R2
case
)
externals ( stem )
integers ( p1 p2 )
groupings ( v )
stringescapes {}
/* special characters (in ISO Latin I) */
stringdef a' hex 'E1' // a-acute
stringdef e' hex 'E9' //e-acute
stringdef i' hex 'ED' //i-acute
stringdef o' hex 'F3' //o-acute
stringdef o" hex 'F6' //o-umlaut
stringdef oq hex 'F5' //o-double acute
stringdef u' hex 'FA' //u-acute
stringdef u" hex 'FC' //u-umlaut
stringdef uq hex 'FB' //u-double acute
//vowels
define v 'aeiou{a'}{e'}{i'}{o'}{o"}{oq}{u'}{u"}{uq}'
define mark_regions as (
$p1 = limit
$p2 = limit
(gopast v (test substring among('cs' 'gy' 'sz' 'ty') setmark p1)) or
(goto v gopast non-v setmark p1)
goto v gopast non-v setmark p2
)
backwardmode (
define R1 as $p1 <= cursor
define R2 as $p2 <= cursor
define v_ending as (
[substring] among(
'{a'}' (<- 'a')
'{e'}' (<- 'e')
)
)
define case as (
[substring] among(
'ban' //inessive
'ben' //inessive
)
delete
v_ending
)
)
define stem as (
do mark_regions
backwards (
do case
)
)
More information about the Snowball-discuss
mailing list