[Snowball-discuss] Unused code in Snowball (C#/Java)
bgrainger at logos.com
Fri Apr 27 17:11:49 BST 2007
The test program (sent as part of my previous email) ran the sample vocabulary through all nineteen stemmers in the distribution. I ran the program under NCover (http://ncover.org/site/) and NCoverExplorer (http://www.kiwidude.com/dotnet/DownloadPage.html) to check code coverage. The comments here only apply directly to the C# version (since that's what I tested), but should also be applicable to the Java version, due to their similarity.
The following methods in SnowballProgram are never called: assign_to, copy_from, eq_v, in_range, in_range_b, out_range, out_range_b, slice_from(StringBuilder).
It looks like the "range" functions were removed from snowball/runtime/utilities.c in revision 311; that makes me suspect that these methods could also be removed from the C# and Java versions of SnowballProgram. Perhaps someone could confirm this suspicion? As for the other methods, I don't know if it's just that those features of Snowball are not yet being used by any of the stemmers, or if they are also unnecessary.
The other finding from looking at the coverage is that the vocabulary files are not exhaustively testing the stemmers in some cases. For example, both "ski" and "skis" are missing from the english/voc.txt (though they are explicit exceptions in the Snowball program). italian/voc.txt does not contain á, í, ó, or ú (i.e., a, e, i, or u with acute accents), which are tested for in the "prelude" routine. lovins/voc.txt does not exercise rule 'J' (i.e., the stripping of "inism"). The "turkish" stemmer also has some unexecuted code, but it wasn't easy for me to figure out exactly what isn't being executed (in terms of the original Snowball program).
I also noticed one minor problem in generator_csharp.c: on line 969, the code produces "current.length()", which isn't legal C#. It should be "current.Length". Obviously this routine isn't actually getting called; is it vestigial, or is it also for a Snowball feature that's not currently being used?
More information about the Snowball-discuss