[Snowball-discuss] More patches

Richard Boulton richard at lemurconsulting.com
Mon Feb 12 01:38:26 GMT 2007


Olly Betts wrote:
> OK, here are the patches:

> Fix a typo of a function name in a comment:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-add_to_b-comment-correction.patch

Applied, thanks (in snowball revision 427).

> This improves the shortcutting of backwards among - if there are fewer
> characters available than the shortest string in the among, there's
> no way it can match.  It also includes a cosmetic tweak (avoiding
> generating "z->c - 0" in the output) which makes the generated source
> a little more readable (of course the C compiler will optimise the "- 0"
> away anyway):
> 
> http://oligarchy.co.uk/xapian/patches/snowball-min-length-shortcut-backwards-among.patch

This is the most interesting of the bunch, but I'm not awake enough to 
consider it in detail.  I'll take a closer look at this patch tomorrow.

> ISO C doesn't require that pointers to different types have the same
> representation, except that void* and pointers to character types must
> have the same representation, as must pointers to qualified and
> unqualified versions of compatible types.
> 
> So, for example, if sizeof(int) is 4, it's possible that int* pointers
> could contain the same bit pattern as a void* pointer to the same
> object, but shifted right by 2 bits.
> 
> Therefore it's not portable to just cast the type of a function pointer
> if the argument types are pointers with potentially different
> represenations as the arguments won't get converted.  I'm not sure if
> any current platforms actually use different pointer representations, so
> the portability improvement may be a somewhat theoretical one, but it's
> easy enough to make the code standard conforming:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-sort-function-casting.patch

I've applied this (in snowball revision 428), but I also needed to 
change the signature of the sort function (to accept "const void *" 
pointers in the comparison function instead of "void *" pointers) to 
avoid a compiler warning.  Of course, you won't see this in Xapian 
because of the change to qsort.

> This patch changes snowball to use the ISO C qsort() function instead
> of the custom sort routine in sort.c.  My motivation is to keep
> the number of source files down, but it seems very similar in
> performance (cachegrind suggests sort.c is a few cycles faster on
> x86-64 linux, but the difference is so small you couldn't measure it
> by normal means) so maybe it's worth considering the change for the
> mainstream snowball compiler too:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-use-qsort.patch

This seems a reasonable patch, but I'm very slightly concerned about 
whether qsort really is ubiquitous.  The man page indicates that it 
conforms to SVr4, 4.3BSD, C99, which is pretty good.  If it had C89 as 
well, I'd have no qualms.  Performance isn't particularly critical in 
the compiler, anyway, for present usage.

Reducing the amount of code needed to be maintained is the main benefit 
of this, and that's a worthy goal given the amount of time we have to 
maintain snowball.  I'm going to apply this patch for now - Martin, or 
anyone else, if you can think of any reason not to apply it, I can 
easily revert the change.

(Olly - applied in snowball revision 429 - I removed the prototype of 
"sort" entirely from snowball/compiler/header.h, rather than commenting 
it out, so I've also applied this change to Xapian to avoid confusion.)

> This patch disables java support, which we don't use and so can save a
> source file.  Really just included for completeness, though perhaps it
> would be useful to apply with a "#define JAVA_SUPPORT" added to make it
> easy for others to disable the java support if they don't need it:
> 
> http://oligarchy.co.uk/xapian/patches/snowball-disable-java-support.patch

In principle, I'm happy with the idea of having the Java support 
conditionally compiled.  I have a half completed patch to snowball lying 
around which adds a python output module, which I would also 
conditionally compile.  However, I think it would be better to "default 
to enabled", so that a straightforward compilation will enable all 
output modes, but certain backends could be explicitly turned off with a 
compiler option.  That way, we're unlikely to get support requests from 
people who've not managed to enable the backend they're trying to use.

I suggest the following patch instead:
http://www.tartarus.org/~richard/xapian-patches/snowball-disable-java-support2.patch
which I've applied to snowball.

Xapian's "languages/Makefile.mk" would then be modified to include 
"-DDISABLE_JAVA" on the compile line for the snowball compiler.

-- 
Richard



More information about the Snowball-discuss mailing list