[Snowball-discuss] A stemmer for latvian

Richard Boulton richard at tartarus.org
Sun Jun 2 08:36:58 BST 2013


Vitālijs,

It sounds like you're doing all the right things; I can't see anything
missing from your description.

You might like to take a look at this pull request on github which adds a
tamil algorithm to libstemmer:
https://github.com/snowballstem/snowball/pull/3/files - you should have
only to make exactly similar changes to the build system to add Latvian.

If that doesn't help, you could send a patch (or even better if you can
manage it, a pull request or branch on github), and I'll be happy to take a
look to see what's wrong.



On 1 June 2013 08:31, Martin Porter <martin.f.porter at gmail.com> wrote:

> Can someone else help with or comment on this? The libstemmer driver
> was put together by Richard Boulton, and I myself have never actually
> used it, so I'm not best placed to assist.
>
> Vitālijs, it does sound like a problem in modifying the make script,
> or something similar, and it might be easier to get help locally in
> the Department in Latvia, -- Martin
>
>
>
> On Fri, May 31, 2013 at 10:32 AM, Vitālijs Mikeļevičs
> <vitalijs.mikelevics at gmail.com> wrote:
> > Hello,
> >
> > I'm currently studying Computer Science in University of Latvia and as a
> > part of my bachelor's thesis I'm recreating Karlis Kreslin's stemmer in
> > Snowball for later use in Sphinx SE and (possibly) adding it to Snowball
> > project.
> >
> > Alas, I'm having problems running it:
> > 1. I've downloaded "Snowball, algorithms, and libstemmer library." from
> > http://snowball.tartarus.org/download.php
> > 2. I've written stem_UTF_8.sbl and put it into algorithms/latvian
> > 3. I've added Latvian to list of languages in GNUmakefile and also add
> it to
> > "other_languages", because it requires UTF-8
> > 4. I've added Latvian to libstemmer/modules.txt and modules_utf8.txt as
> > latvian --> UTF_8 --> latvian,lv
> > 5. I run "make", everything compiles, headers are created, modules are
> > updated, libstemmer/mkinc.mak is updated, yet... when I run "stemwords -l
> > latvian" it tells me that "language 'latvian' is not available for
> > stemming". I've tried what seems to be everything, yet it still somehow
> > doesn't work.
> >
> > Have I missed a step somewhere or did I do something wrong? Any possible
> > advice appreciated.
> >
> > Best regards
> > Vitālijs Mikeļevičs // Vitaly Mikelevich
>
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss at lists.tartarus.org
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20130602/d8a14e11/attachment.htm>


More information about the Snowball-discuss mailing list