[Snowball-discuss] More patches

Richard Boulton richard at lemurconsulting.com
Thu Feb 15 14:16:18 GMT 2007


Olly Betts wrote:
> On Mon, Feb 12, 2007 at 08:30:28AM +0000, Olly Betts wrote:
>> This adds a "make check" rule which verifies that the UTF-8 and
>> ISO-8859-1 versions of the stemmers actually produce the expected
>> output on the test vocabulary.
> 
> This patch extends the rules so that "make check" will print a warning
> for algorithm/encoding combinations for which there's no test data.
> This isn't used by the sources as shipped, but if you enable other
> algorithms, it's useful:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-default-make-check-rule.patch
> 
> Alternatively, perhaps we should just generate test data by running a
> suitable vocabulary through the stemming algorithm - that will at least
> allow checking that no regressions are introduced by changes to the
> snowball compiler and runtime.  The missing data is for lovins, german2,
> and romanian2, and we have english, german, and romanian vocabulary for
> other stemmers.  If that seems a better approach, I'm happy to provide
> a patch to do that instead.

Since we have suitable sample data for each of these, perhaps we should 
just add the current output of each of these stemmers to svn in some 
appropriate place, and test with them.  Something like 
"data/english/output-lovins.txt" for the lovins stemmer, for example.

There's a great deal of convenience value in having the expected output 
checked into SVN, I believe.

A warning for any stemmers which we haven't supplied an expected output 
file would be a good thing, so your patch is certainly on the right lines.

-- 
Richard



More information about the Snowball-discuss mailing list