[Snowball-discuss] what about Perl PorterStemmer

Olly Betts olly@survex.com
Wed Jun 18 12:34:01 2003


On Wed, Jun 18, 2003 at 01:11:45PM +0200, cyrille wrote:
> Olly Betts a écrit:
> >I would say the most useful approach would be to write a Perl backend
> >for the Snowball compiler.  Then *any* Snowball stemmer (existing or yet
> >to be written) would be automatically available in Perl as well.
> 
> there is already one :
>  http://www.snowball.tartarus.org/wrappers/guide.html
> perhaps I can use it but my needs are a bit special.

That's different to what I'm suggesting.  The Perl wrappers call the C
versions via XS glue.  I'm suggesting a compiler backend which would
write out native Perl code.  Both ways have their merits, but it's nice
to have the flexibility.

> in a Perl application I've to create keywords for indexing.
> but text to be indexed comes by many little peaces of string by the way of 
> da application get.
> So I'm afraid that calling a external application for each strings will 
> break down performances...

It's not an external application, it's a call from the perl interpreter
to a shared library.  There's some overhead from the XS glue, but I
wouldn't expect it to be huge.

I saw some benchmarks for stemmers in Perl (on this list I think) which
suggested the way to get performance was to cache the results of each
stemming operation (so if the same word is stemmed twice, the second
time uses the cached result).  You could easily wrap a hash cache around
the call to Snowball.

> But I did not try da Perl wrappers, perhaps I miss some knowledge.
> Perhaps the stemmer say in memory ?

It will (to the extent that any application code will on a system with
virtual memory).

Cheers,
    Olly