[Snowball-discuss] Download tarball inconsistencies

richard at lemurconsulting.com richard at lemurconsulting.com
Sun Sep 10 10:26:36 BST 2006


On Sun, Sep 10, 2006 at 04:59:24AM +0100, Olly Betts wrote:
> There are inconsistencies in which .sbl files are included in the
> different downloads available.  Here's a list (the first number is the
> file size):

> I find it somewhat suprising that they don't contain exactly the same
> set of .sbl files!

They should do now.  (And the timestamps should be the same, too, not that
they're particularly meaningful.)

The stem.sbl files assume encoding in Latin-1 - but since for the
characters they accept this is the same as Unicode, they can be compiled as
unicode algorithms using the appropriate switch to the snowball compiler
(IIRC, -u).  The encodings expected by the other stem-*.sbl files should be
obvious.

> I'm also somewhat confused since when I look at CVS, I only see stem.sbl
> in any language directory (and there are no directories for the romanian
> stemmers).  So where are these other versions of the .sbl files coming
> from?  And how are the new Romanian stemmers getting in there?

I've fixed the link to the CVS repository.  We changed to using this ages
ago, I should have noticed this a long time ago, sorry.

-- 
Richard



More information about the Snowball-discuss mailing list