[Snowball-discuss] Benchmarking (... On a lovely Sunday morning)

Allan Fields afieldsml@idirect.ca
Tue, 23 Apr 2002 12:47:31 -0400


On April 23, 2002 12:23 pm, Teodor Sigaev wrote:
> Hello.

Hi

> > Perl's features.  I'll also try to run it against the
> > Lingua::Stem::Snowball module recently submitted to this list, to see if
> > that isn't the best solution for speed in Perl.  I think it makes more
> > sense to interface, but it's not uncommon to have both an interface and
> > native implementation(s).

I haven't got to this yet.  Planning to do this soon before I submit the 
corrections to my script.  Unfortunately there is still one bug I'm working 
out.

> [...]
>
> > -- Lingua::Stem --
> > 1       : candidatus -> candidatu       24 wallclock secs (22.45 usr + 
> > 0.00 sys = 22.45 CPU) @ 2227.64/s (n=50000) -- Lingua::Stem --
> > ...
> > -- bench-lingua-stem.pl --
> > #!/usr/bin/perl
> > use Lingua::Stem qw(:all);
> > use Benchmark;
> >
> > my @word = grep chomp, <>;
> > my ($n,$pu,$ps) = (0,0,0); my $s = 10;
> > for (1..$s) {
> >   my $result;
> >   my $w = @word[rand(scalar(@word))];
> >   my $t = timeit(100000, sub { ($result) = @{stem($w)} } );
> >   print "$_\t: $w -> $result\t",timestr($t),"\n";
> >   $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
> > }
> > printf  "Average random cross-sectional stem rate for $s words: %5.2f Hz
> > (n=%d).\n", $n/($pu+$ps), $n;
>
> Is it script for Lingua::Stem::Snowball? If it is, what is function stem()?
> Lingua::Stem::Snowball isn't provide function stem(), stem is a method of
> object Lingua::Stem::Snowball. Function is named as snowball(). BTW, using
> function snowball() you must get significant performance degradation,
> because it construct Lingua::Stem::Snowball's object internally :). Instead
> let you use stem() method.

No, this is calling the 'Lingua::Stem' module with is different from 
Lingua::Stem::Snowball, this one doesn't interface with snowball at all, 
although it uses Porter 1 algorithm -- they share a similar branch in the 
CPAN tree, but are two different modules with Lingua::Stem being from Stem.pm 
and En.pm files and snowball being other files entirely.

The problem with Lingua::Stem as compared to other Stemmers is it uses some 
rather exotic sub calls to achieve the stemming rules.  Which bogs it down 
considerably.  This shouldn't be an issue with Lingua::Stem::Snowball as it 
directly interfaces the snowball generated C code.

I'll make sure to use the stem() method of Lingua::Stem::Snowball when I do 
that benchmark.

Thanks for the info..

-- Allan Fields

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss