[Snowball-discuss] Local variables for snowball

Olly Betts olly at survex.com
Mon Sep 11 13:50:26 BST 2006


On Mon, Sep 11, 2006 at 12:53:54PM +0100, Richard Boulton wrote:
> On Mon, Sep 11, 2006 at 12:11:50PM +0100, Olly Betts wrote:
> > So far I've done integers and booleans, but not strings as they're a
> > little more work.
> 
> > Does this language extension seem suitable for inclusion?  If so, I'll
> > add support for strings and see if I can get the Java code generator to
> > implement it too.
> 
> I have no particular objection to this patch.  On the other hand, I can't
> see any performance difference so far between the patched version and the
> non-patched version.  I agree with your logic as to why it might improve
> performance, but evaluations trump logic!  Of course, it's quite possible
> that a difference would be seen if you extended this to strings, or it's
> possible that my sample vocabulary (voc.txt repeated 10 times) isn't
> sufficiently representative, etc...

I did try some performance measurements, but I seem to get a lot of
variation in timings on this box even between runs of the same code
(I suspect because it's an Athlon 64 which underclocks when idle and the
clockspeed varying daemon doesn't respond to the start of the test
program in exactly the same way each time).

It's unclear to me if it will be good for performance to have local
strings or not.  The issue that concerns me is that string creation and
destruction involves malloc and free so the obvious implementation could
well be measurably slower!

> If you want to do the work, and Martin has no objection, I'd be happy to
> include this, since local variables could make the snowball code neater in
> some cases, anyway.

I've now gone through all the stemmers and patched them to use local
variables.  It's a definite improvement to code clarity for most.

> Now, if _everything_ could be made into a local variable, so that the
> routines could be called in a multithreaded environment without needing
> locking or per-thread stemmer objects, I'd be _very_ interested. ;-)

We can't literally make everything a local variable, unless we get the
snowball compiler to inline everything into one C function.  That would
probably work OK for the current uses of snowball, but at some point
it will become problematic for a larger snowball program, and it doesn't
naturally allow multiple "externals".

But we can pass global variables as parameters to functions instead of
using the SN_env struct, which is the direction I'd like to try to move
in (and allowing local variables helps keep parameter lists a managable
size).  If we build a simple flow-graph like structure, we should be
able to avoid passing all the globals to many functions.

> (Incidentally, note that I've just committed a patch to allow all the
> snowball code to compile with a C++ compiler as well as a C compiler.  Your
> patch still applies, just about.)

Cool - that'll be useful for Xapian.

Cheers,
    Olly



More information about the Snowball-discuss mailing list