[Snowball-discuss] Build system, and misc changes.

Richard Boulton richard@tartarus.org
16 Nov 2001 16:42:18 +0000


On Fri, 2001-11-16 at 15:59, Martin Porter wrote:
> 
> >* fixed a bug in snowball itself: in the generated header file: if an
> >  external prefix was specified, close_env wasn't being prefixed.
> 
> Well, that was deliberate and conforms to the documentation. 'close_env' is
> the same for all the stemmers. But yes it is still a bug, since it would
> lead to 'close_env' being mutiply defined if you used a number of stemmers.
> I will have to alter the documentation since you don't have the marked-up
> documentation sources.

There was an inconsistency anyway, since the prefix was being added to
close_env in stem.c already.  eg, for a prefix of english_, we had
english_close_env in stem.c and close_env in stem.h

I'm not sure which pieces of documentation need to be fixed; q/use.html
is the only place I can see where close_env is referred to, and I've
fixed that.

> >I also began work on making the build system build a single library,
> >which will be able to perform stemming for any of the supported
> >libraries (and will be a lot more convenient for IR people such as
> >myself to use than including scripts individually).  As part of this
> >work, I changed the generated stem.[ch] files to use prefixes, to avoid
> >symbol conflicts when I link them together.  The prefix used is
> >"language_" for each language.
> >
> >This means that, for example, the english stemmer now has
> >english_create_env() english_close_env() and english_stem().
> 
> (Groans slightly anticipating work.) Yes I guess I saw that coming. But we
> need a system where people can pick and mix among the stemmers. The various
> .c or .o files should be distinct.

I propose leaving the existing files as they are (so the language/stem.c
and language/stem.h files are directly available), but also having a
library available which contains the code for all the stemmers.  The
interface to the library would be something like:

/* Get a stemmer for the named language. */
struct stem * stemmer_init(const char * name);
/* Stem a given word. */
int stemmer_stem(struct stem * stemmer, char * word);
/* Close stemmer. */
void stemmer_close(struct stem * stemmer);

I'm not proposing changes to any of the existing system, just making a
more convenient form for developers to use; so I don't think there
should be more work for you...

-- 
Richard

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss

_____________________________________________________________________
VirusChecked by the Incepta Group plc
_____________________________________________________________________