[Snowball-discuss] Download tarball inconsistencies
richard at lemurconsulting.com
richard at lemurconsulting.com
Sun Sep 10 10:26:36 BST 2006
On Sun, Sep 10, 2006 at 04:59:24AM +0100, Olly Betts wrote:
> There are inconsistencies in which .sbl files are included in the
> different downloads available. Here's a list (the first number is the
> file size):
> I find it somewhat suprising that they don't contain exactly the same
> set of .sbl files!
They should do now. (And the timestamps should be the same, too, not that
they're particularly meaningful.)
The stem.sbl files assume encoding in Latin-1 - but since for the
characters they accept this is the same as Unicode, they can be compiled as
unicode algorithms using the appropriate switch to the snowball compiler
(IIRC, -u). The encodings expected by the other stem-*.sbl files should be
obvious.
> I'm also somewhat confused since when I look at CVS, I only see stem.sbl
> in any language directory (and there are no directories for the romanian
> stemmers). So where are these other versions of the .sbl files coming
> from? And how are the new Romanian stemmers getting in there?
I've fixed the link to the CVS repository. We changed to using this ages
ago, I should have noticed this a long time ago, sorry.
--
Richard
More information about the Snowball-discuss
mailing list