[Snowball-discuss] Mismatch between vocab.txt and output.txt
Olly Betts
olly@survex.com
Mon Oct 14 14:27:01 2002
On Mon, Oct 14, 2002 at 04:18:27AM -0600, Martin Porter wrote:
>
> >The first disagreement for finnish is that the stemmer produces
> >"aachenin" but output.txt contains "aachen".
>
> I think I don't undertand. Are you saying the stemmer actually stems
> aachenin to aachenin while the file output.txt implies that it stems
> aachenin to aachen?
Exactly.
> My Finnish stem.c stems aachenin to aachen, and it is the same as the one on
> the Wesite, which is the same as the one in the tarball on the website (I
> downloaded both to check.)
I generated my stemmers from the ".sbl" sources, but the difference from
the finnish stem.c on the website are just in the function names. Most
odd - I'll see if I can work out what's going on.
Cheers,
Olly