[Xapian-devel] stemtest failing with romanian

Richard Boulton richard at lemurconsulting.com
Thu Mar 29 17:18:11 BST 2007


On Tuesday, I replaced the romanian1 and romanian2 stemmers in 
Xapian-core with Martin's new romanian stemmer.  At the time, I also 
updated the stemming test data (by re-generating the output file using 
snowball's "stemwords" utility), and I clearly remember re-running the 
testsuite at the time and checking that all tests passed.

Now, when I run make check, stemtest fails with the romanian stemmer on 
the word "acelaşI".  This should stem to "acel" according to the output 
file generated by the stemwords utility from snowball, but xapian stems 
it to "acelaşi".  The only change I can see to the stemming algorithms 
since then is a change to the snowball code generator made on wednesday 
morning by Olly, but reverting this change doesn't seem to fix the 
problem.  So - does anyone else see this error, or is it just something 
on my local machine which has changed (possibly a character set thing, 
or something like that)?

For reference, the output from stemtest is as follows:

$ ./tests/runtest tests/stemtest stemdict -l romanian -v
Running test 'tests/stemtest stemdict -l romanian -v' under valgrind
The random seed is 42
Please report the seed when reporting a test failure.
Running tests with romanian stemmer...
Running test: stemdict...
Testing romanian with fixed dictionary...
/home/richard/private/Working/xapian/xapian-core/tests/stemtest.cc:146: 
((stem) == (expect))
Expected `stem' and `expect' to be equal: were acelaşi and acel

  FAILED
/home/richard/private/Working/xapian/build/xapian-core/tests/.libs/lt-stemtest 
completed test run: 0 tests passed, 1 failed.


-- 
Richard



More information about the Xapian-devel mailing list