[Snowball-discuss] Digraphs

Martin Porter martin.f.porter at gmail.com
Mon Jul 16 18:12:39 BST 2012


On 7/16/12, Grushevskiy Dmitry <dgr at jooble.com> wrote:
> Please help me
>
> In Polish digraphs using like letters, but snowball compiler ignoring it
>
> stringdef ia   hex '69 61'
. . . .
>
> define v 'a{a"}e{e"}o{o"}uy{ia}{ia"}{ie}{ie"}{io}{io"}{iu}i'
>

Dmitry,

It's because a stringdef, as you suppose, gives a name to a sequence
of characters, but a 'define' for a grouping defines a name which
stand for a group of single characters, so if {io} is a stringdef of a
pair of characters, define v '...{io}...' puts each member of the pair
into the group, not the digraph pairing {io}. In other words,

stringdef {io} 'io'
define i_or_o 'io{io}'

puts i into i_or_o, then o, then {io}, which is just i and o again,
and so is the same as

define i_or_o as 'io'

It may be you don't need to worry too much about digraphs -- in
Spanish, ch and ll are digraphs, but that doesn't really affect
writing a stemmer. It might matter if you were counting letters, e.g.
if two vowels separated by a single consonant, or single digraph that
stands for a consonant, required special handling. But you can get
over that by using an among(...) for the digraphs, and then or-ing it
with a group name for the single-letter consonants.

I hope the above notes make sense.

Martin



More information about the Snowball-discuss mailing list