[Snowball-discuss] Mismatch between vocab.txt and output.txt

Olly Betts olly@survex.com
Mon Oct 14 14:27:01 2002


On Mon, Oct 14, 2002 at 04:18:27AM -0600, Martin Porter wrote:
> 
> >The first disagreement for finnish is that the stemmer produces
> >"aachenin" but output.txt contains "aachen".
> 
> I think I don't undertand. Are you saying the stemmer actually stems
> aachenin to aachenin while the file output.txt implies that it stems
> aachenin to aachen?

Exactly.

> My Finnish stem.c stems aachenin to aachen, and it is the same as the one on
> the Wesite, which is the same as the one in the tarball on the website (I
> downloaded both to check.)

I generated my stemmers from the ".sbl" sources, but the difference from
the finnish stem.c on the website are just in the function names.  Most
odd - I'll see if I can work out what's going on.

Cheers,
    Olly