[Snowball-discuss] Local variables for snowball

Olly Betts olly at survex.com
Mon Sep 11 12:11:50 BST 2006


I was looking at the generated C code and thinking it would be nice to
be able to make some variables local variables rather than putting them
all in SN_env.  The dereferencing must add some overhead - not much per
invocation, but for a lot of text it will add up.  There's overhead to
allocate and deallocate but that matters less as people tend to create
a stemmer and stem a lot of words with it.  It may also be useful to
be able to write recursive routines where the local variable is
different for each nested invocation.

The first cut of a patch to implement this is here (including an update
for the Snowball manual):

http://oligarchy.co.uk/xapian/patches/snowball-local-variables.patch

So far I've done integers and booleans, but not strings as they're a
little more work.

And here's an example of how it can be used in the English stemmer
(also included in the patch).  By hand-inling "preamble" and "postamble"
into "stem", Y_found can be made a local variable:

    define stem as (
	booleans ( Y_found )

	exception1 or
	not hop 3 or (
	    ( // prelude
		do ( ['{'}'] delete)
		do ( ['y'] <-'Y' set Y_found)
		do repeat(goto (v ['y']) <-'Y' set Y_found)
	    )
	    do mark_regions
	    backwards (

		do Step_1a

		exception2 or (

		    do Step_1b
		    do Step_1c

		    do Step_2
		    do Step_3
		    do Step_4

		    do Step_5
		)
	    )
	    ( // postlude
		Y_found  repeat(goto (['Y']) <-'y')
	    )
	)
    )

And in the generated code, we now have:

    extern int english_UTF_8_stem(struct SN_env * z) {
	{
	    int v_Y_found = 0;
	    {   int c = z->c; /* or, line 196 */
    [...]

I've verified this modified English stemmer still gives the same results
on the sample vocabulary.

Does this language extension seem suitable for inclusion?  If so, I'll
add support for strings and see if I can get the Java code generator to
implement it too.

Cheers,
    Olly



More information about the Snowball-discuss mailing list