[Snowball-discuss] Benchmarking (... On a lovely Sunday morning)

Allan Fields afieldsml@idirect.ca
Sun, 21 Apr 2002 08:36:30 -0400


Hi,

Here are still more details about the various Perl solutions.  Surprisingly,
I didn't find Daniel van Balen's algorithm in porter.pm any faster than the
perl.txt algorithm you've implemented Martin.  My benchmarks were quick
tests, so I'm not 100% confident these numbers are authoritative.  Any
suggestions on Benchmarking would be welcome.  However, Martin, your
implementation comes out on top as far as speed from what I can tell.

I thought I would do this at least once as an experiment (more for my own
curiosity).. =)  And also, I've fixed some problems in the recently submitted
script.  I'll resubmit it along with it's new benchmark data.  There were a few
programming gaffes on my behalf, and performance issues.

I would be curious if you can compare the overall Perl performance to using
the C versions of the snowball output.  I think Perl's strength in this area is
the full-feature regular expression engine, and as I'll try to demonstrate in
my next submission, things can be optimized somewhat by fully exploiting
Perl's features.  I'll also try to run it against the Lingua::Stem::Snowball
module recently submitted to this list, to see if that isn't the best solution for
speed in Perl.  I think it makes more sense to interface, but it's not uncommon
to have both an interface and native implementation(s).

These tests were performed on a modest (ancient) system:
It's a PII/233 (66 MHz FSB) with 256MB SDRAM running Perl 5.6 under
FreeBSD 4-STABLE in a multiuser environment. There were the following
parameters around the time of the tests, suggesting that the system is a
typical multiuser system (if not a little loaded down):

last pid: 15293;  load averages:  0.11,  0.30,  0.41   up 31+11:31:54  05:19:12
180 processes: 2 running, 177 sleeping, 1 stopped
CPU states:  0.5% user,  0.0% nice,  0.5% system,  0.0% interrupt, 99.0% idle
Mem: 117M Active, 96M Inact, 26M Wired, 7116K Cache, 35M Buf, 2800K Free
Swap: 1152M Total, 225M Used, 926M Free, 19% Inuse

Since FreeBSD is a very efficient platform, there isn't much chance the
results as recorded are skewed by other processes.  Processor utilization
for the perl process housing the stemmer was close to 97% for the full test
series.  (Larger numbers in Hz are better.  Scroll to the bottom of each section
for a summary.  All of them are stemming on cross-sections of voc.txt available 
the website.)

	Allan


-- perl-bench.txt (unmodified perl.txt + benchmarking code) --
1       : cade -> cade   2 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 4812.03/s (n=5000)
2       : psalms -> psalm        1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5289.26/s (n=5000)
3       : devising -> devis      1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 4383.56/s (n=5000)
4       : residue -> residu      2 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 4155.84/s (n=5000)
5       : yoked -> yoke  1 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU) @ 3459.46/s (n=5000)
6       : blessing -> bless      2 wallclock secs ( 1.13 usr +  0.00 sys =  1.13 CPU) @ 4413.79/s (n=5000)
7       : gallop -> gallop       1 wallclock secs ( 0.90 usr +  0.00 sys =  0.90 CPU) @ 5565.22/s (n=5000)
8       : holborn -> holborn     1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5289.26/s (n=5000)
9       : edg -> edg     1 wallclock secs ( 0.65 usr +  0.00 sys =  0.65 CPU) @ 7710.84/s (n=5000)
10      : mobled -> mobl         2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3555.56/s (n=5000)
11      : incertain -> incertain         2 wallclock secs ( 1.16 usr +  0.01 sys =  1.16 CPU) @ 4295.30/s (n=5000)
12      : collect -> collect     1 wallclock secs ( 0.96 usr +  0.00 sys =  0.96 CPU) @ 5203.25/s (n=5000)
13      : meditating -> medit    2 wallclock secs ( 1.44 usr +  0.00 sys =  1.44 CPU) @ 3478.26/s (n=5000)
14      : udders -> udder        1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5245.90/s (n=5000)
15      : latest -> latest       1 wallclock secs ( 0.91 usr +  0.00 sys =  0.91 CPU) @ 5470.09/s (n=5000)
16      : ephesians -> ephesian  2 wallclock secs ( 1.22 usr +  0.00 sys =  1.22 CPU) @ 4102.56/s (n=5000)
17      : misinterpret -> misinterpret   1 wallclock secs ( 1.40 usr +  0.01 sys =  1.41 CPU) @ 3555.56/s (n=5000)
18      : reckoned -> reckon     2 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4050.63/s (n=5000)
19      : beggared -> beggar     1 wallclock secs ( 1.25 usr +  0.00 sys =  1.25 CPU) @ 4000.00/s (n=5000)
20      : dip -> dip     1 wallclock secs ( 0.63 usr +  0.00 sys =  0.63 CPU) @ 7901.23/s (n=5000)
21      : dies -> di     1 wallclock secs ( 0.62 usr +  0.00 sys =  0.62 CPU) @ 8101.27/s (n=5000)
22      : track -> track         1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 6274.51/s (n=5000)
23      : somewhat -> somewhat   1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 4776.12/s (n=5000)
24      : havings -> have        3 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU) @ 3404.26/s (n=5000)
25      : bustle -> bustl        2 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 4025.16/s (n=5000)
26      : princess -> princess   1 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 4705.88/s (n=5000)
27      : vaux -> vaux   1 wallclock secs ( 0.72 usr +  0.00 sys =  0.72 CPU) @ 6956.52/s (n=5000)
28      : beating -> beat        1 wallclock secs ( 1.40 usr +  0.00 sys =  1.40 CPU) @ 3575.42/s (n=5000)
29      : eats -> eat   -1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 6274.51/s (n=5000)
30      : blanket -> blanket     0 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5079.37/s (n=5000)
31      : mortis -> morti        1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5289.26/s (n=5000)
32      : accites -> accit       1 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU) @ 3855.42/s (n=5000)
33      : bedchamber -> bedchamb         2 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3636.36/s (n=5000)
34      : belt -> belt   1 wallclock secs ( 0.73 usr +  0.00 sys =  0.73 CPU) @ 6881.72/s (n=5000)
35      : enfeebles -> enfeebl   2 wallclock secs ( 1.46 usr +  0.00 sys =  1.46 CPU) @ 3422.46/s (n=5000)
36      : caesarion -> caesarion         1 wallclock secs ( 1.21 usr +  0.00 sys =  1.21 CPU) @ 4129.03/s (n=5000)
37      : strangle -> strangl    2 wallclock secs ( 1.44 usr +  0.00 sys =  1.44 CPU) @ 3478.26/s (n=5000)
38      : keiser -> keiser       1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4885.50/s (n=5000)
39      : wands -> wand  1 wallclock secs ( 0.87 usr +  0.00 sys =  0.87 CPU) @ 5765.77/s (n=5000)
40      : strikers -> striker    2 wallclock secs ( 1.23 usr +  0.01 sys =  1.23 CPU) @ 4050.63/s (n=5000)
41      : birthday -> birthdai   1 wallclock secs ( 1.29 usr +  0.00 sys =  1.29 CPU) @ 3878.79/s (n=5000)
42      : potting -> pot         1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5245.90/s (n=5000)
43      : successively -> success        2 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 2819.38/s (n=5000)
44      : awhile -> awhil        1 wallclock secs ( 1.13 usr +  0.00 sys =  1.13 CPU) @ 4413.79/s (n=5000)
45      : esteemed -> esteem     2 wallclock secs ( 1.21 usr +  0.00 sys =  1.21 CPU) @ 4129.03/s (n=5000)
46      : nephews -> nephew      1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4885.50/s (n=5000)
47      : weather -> weather     1 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4475.52/s (n=5000)
48      : errate -> errat        2 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4444.44/s (n=5000)
49      : unbridled -> unbridl   2 wallclock secs ( 1.28 usr +  0.00 sys =  1.28 CPU) @ 3902.44/s (n=5000)
50      : chins -> chin  2 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU) @ 5714.29/s (n=5000)
51      : heavily -> heavili     5 wallclock secs ( 1.25 usr +  0.00 sys =  1.25 CPU) @ 4000.00/s (n=5000)
52      : horn -> horn   2 wallclock secs ( 0.72 usr +  0.00 sys =  0.72 CPU) @ 6956.52/s (n=5000)
53      : justices -> justic     4 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3636.36/s (n=5000)
54      : obstruct -> obstruct   2 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 4848.48/s (n=5000)
55      : afore -> afor  2 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 4705.88/s (n=5000)
56      : befriended -> befriend         4 wallclock secs ( 1.44 usr +  0.00 sys =  1.44 CPU) @ 3478.26/s (n=5000)
57      : slops -> slop  1 wallclock secs ( 0.87 usr +  0.00 sys =  0.87 CPU) @ 5765.77/s (n=5000)
58      : walks -> walk  2 wallclock secs ( 0.89 usr +  0.00 sys =  0.89 CPU) @ 5614.04/s (n=5000)
59      : samson -> samson       2 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU) @ 5714.29/s (n=5000)
60      : dries -> dri   1 wallclock secs ( 0.78 usr +  0.00 sys =  0.78 CPU) @ 6400.00/s (n=5000)
61      : seeming -> seem        3 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 4705.88/s (n=5000)
62      : these -> these         2 wallclock secs ( 1.17 usr +  0.00 sys =  1.17 CPU) @ 4266.67/s (n=5000)
63      : answer -> answer       1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4923.08/s (n=5000)
64      : corruptibly -> corrupt         2 wallclock secs ( 1.71 usr +  0.01 sys =  1.72 CPU) @ 2909.09/s (n=5000)
65      : abysm -> abysm         1 wallclock secs ( 0.81 usr +  0.00 sys =  0.81 CPU) @ 6153.85/s (n=5000)
66      : inclips -> inclip      0 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 4740.74/s (n=5000)
67      : whirling -> whirl      2 wallclock secs ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 4353.74/s (n=5000)
68      : compile -> compil      3 wallclock secs ( 1.22 usr +  0.00 sys =  1.22 CPU) @ 4102.56/s (n=5000)
69      : whom -> whom   1 wallclock secs ( 0.71 usr +  0.00 sys =  0.71 CPU) @ 7032.97/s (n=5000)
70      : offert -> offert       3 wallclock secs ( 0.90 usr +  0.00 sys =  0.90 CPU) @ 5565.22/s (n=5000)
71      : bottomless -> bottomless       2 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4076.43/s (n=5000)
72      : pudder -> pudder       1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 5039.37/s (n=5000)
73      : summers -> summer      2 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4444.44/s (n=5000)
74      : footboys -> footboi    2 wallclock secs ( 1.32 usr +  0.00 sys =  1.32 CPU) @ 3786.98/s (n=5000)
75      : mellowing -> mellow    2 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4076.43/s (n=5000)
76      : spinners -> spinner    2 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4050.63/s (n=5000)
77      : trinculo -> trinculo   1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 4812.03/s (n=5000)
78      : scissors -> scissor    1 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4475.52/s (n=5000)
79      : broking -> broke       0 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
80      : erfraught -> erfraught         2 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 4383.56/s (n=5000)
81      : quire -> quir  1 wallclock secs ( 1.16 usr +  0.00 sys =  1.16 CPU) @ 4295.30/s (n=5000)
82      : massacres -> massacr   2 wallclock secs ( 1.46 usr +  0.01 sys =  1.47 CPU) @ 3404.26/s (n=5000)
83      : declin -> declin       1 wallclock secs ( 0.89 usr +  0.00 sys =  0.89 CPU) @ 5614.04/s (n=5000)
84      : mowing -> mow  1 wallclock secs ( 0.96 usr +  0.00 sys =  0.96 CPU) @ 5203.25/s (n=5000)
85      : thrower -> thrower     1 wallclock secs ( 1.09 usr +  0.00 sys =  1.09 CPU) @ 4604.32/s (n=5000)
86      : doubled -> doubl       2 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
87      : tertio -> tertio       3 wallclock secs ( 0.90 usr +  0.01 sys =  0.91 CPU) @ 5517.24/s (n=5000)
88      : deliv -> deliv         1 wallclock secs ( 0.82 usr +  0.00 sys =  0.82 CPU) @ 6095.24/s (n=5000)
89      : misery -> miseri       1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 4383.56/s (n=5000)
90      : ns -> ns       1 wallclock secs ( 0.12 usr +  0.00 sys =  0.12 CPU) @ 42666.67/s (n=5000)
91      : peopled -> peopl       1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 4383.56/s (n=5000)
92      : codpiece -> codpiec    2 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU) @ 3855.42/s (n=5000)
93      : palating -> palat      2 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3636.36/s (n=5000)
94      : naples -> napl         2 wallclock secs ( 1.33 usr +  0.00 sys =  1.33 CPU) @ 3764.71/s (n=5000)
95      : liege -> lieg  1 wallclock secs ( 1.19 usr +  0.00 sys =  1.19 CPU) @ 4210.53/s (n=5000)
96      : everything -> everyth  2 wallclock secs ( 1.29 usr +  0.00 sys =  1.29 CPU) @ 3878.79/s (n=5000)
97      : goot -> goot   0 wallclock secs ( 0.71 usr +  0.00 sys =  0.71 CPU) @ 7032.97/s (n=5000)
98      : redeem -> redeem       1 wallclock secs ( 0.91 usr +  0.00 sys =  0.91 CPU) @ 5517.24/s (n=5000)
99      : restraint -> restraint         1 wallclock secs ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 4353.74/s (n=5000)
100     : dolphin -> dolphin     1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5245.90/s (n=5000)
Average random cross-sectional stem rate for 100 words: 4550.95 Hz (n=500000).
-- perl-bench.txt --
1       : wiser -> wiser        14 wallclock secs ( 8.47 usr +  0.01 sys =  8.48 CPU) @ 5898.62/s (n=50000)
2       : colors -> color       11 wallclock secs ( 9.35 usr +  0.01 sys =  9.36 CPU) @ 5342.24/s (n=50000)
3       : doating -> doat       22 wallclock secs (13.92 usr +  0.00 sys = 13.92 CPU) @ 3591.47/s (n=50000)
4       : sweets -> sweet       10 wallclock secs ( 9.61 usr + -0.01 sys =  9.60 CPU) @ 5207.49/s (n=50000)
5       : annals -> annal       11 wallclock secs ( 9.88 usr +  0.00 sys =  9.88 CPU) @ 5063.29/s (n=50000)
6       : rushes -> rush        14 wallclock secs (12.98 usr +  0.01 sys = 12.99 CPU) @ 3848.47/s (n=50000)
7       : alarums -> alarum     12 wallclock secs (10.66 usr +  0.00 sys = 10.66 CPU) @ 4692.08/s (n=50000)
8       : humbler -> humbler    11 wallclock secs (10.79 usr +  0.00 sys = 10.79 CPU) @ 4634.32/s (n=50000)
9       : clepeth -> clepeth    12 wallclock secs ( 9.61 usr +  0.00 sys =  9.61 CPU) @ 5203.25/s (n=50000)
10      : tumbled -> tumbl      18 wallclock secs (14.77 usr +  0.00 sys = 14.77 CPU) @ 3386.24/s (n=50000)
Average random cross-sectional stem rate for 10 words: 4543.52 Hz (n=500000).
-- perl-bench.txt --
1       : desires -> desir      29 wallclock secs (24.45 usr +  0.00 sys = 24.45 CPU) @ 4090.76/s (n=100000)
2       : fretten -> fretten    20 wallclock secs (16.92 usr +  0.00 sys = 16.92 CPU) @ 5909.51/s (n=100000)
3       : call -> call  19 wallclock secs (17.03 usr +  0.00 sys = 17.03 CPU) @ 5871.56/s (n=100000)
4       : reasonless -> reasonless      24 wallclock secs (19.45 usr +  0.00 sys = 19.45 CPU) @ 5140.56/s (n=100000)
5       : shaping -> shape      32 wallclock secs (29.19 usr +  0.02 sys = 29.20 CPU) @ 3424.29/s (n=100000)
6       : monsieur -> monsieur  19 wallclock secs (17.70 usr +  0.00 sys = 17.70 CPU) @ 5651.21/s (n=100000)
7       : clouded -> cloud      27 wallclock secs (22.16 usr +  0.00 sys = 22.16 CPU) @ 4513.40/s (n=100000)
8       : gun -> gun    14 wallclock secs (11.68 usr +  0.00 sys = 11.68 CPU) @ 8561.87/s (n=100000)
9       : gloriously -> glorious        31 wallclock secs (28.62 usr +  0.01 sys = 28.63 CPU) @ 3492.50/s (n=100000)
10      : lustily -> lustili    26 wallclock secs (21.65 usr +  0.00 sys = 21.65 CPU) @ 4619.27/s (n=100000)
Average random cross-sectional stem rate for 10 words: 4787.73 Hz (n=1000000).
-- perl-bench.txt --
1       : barnacles -> barnacl   1 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 4063.49/s (n=2000)
2       : marchpane -> marchpan  0 wallclock secs ( 0.41 usr +  0.00 sys =  0.41 CPU) @ 4830.19/s (n=2000)
3       : between -> between     1 wallclock secs ( 0.33 usr +  0.00 sys =  0.33 CPU) @ 6095.24/s (n=2000)
4       : embassies -> embassi   0 wallclock secs ( 0.38 usr +  0.00 sys =  0.38 CPU) @ 5224.49/s (n=2000)
5       : admonition -> admonit  1 wallclock secs ( 0.45 usr +  0.00 sys =  0.45 CPU) @ 4413.79/s (n=2000)
6       : bangor -> bangor       0 wallclock secs ( 0.31 usr +  0.00 sys =  0.31 CPU) @ 6400.00/s (n=2000)
7       : couched -> couch       1 wallclock secs ( 0.45 usr +  0.00 sys =  0.45 CPU) @ 4491.23/s (n=2000)
8       : stare -> stare         0 wallclock secs ( 0.45 usr +  0.00 sys =  0.45 CPU) @ 4491.23/s (n=2000)
9       : voutsafe -> voutsaf    1 wallclock secs ( 0.45 usr +  0.00 sys =  0.45 CPU) @ 4491.23/s (n=2000)
10      : disease -> diseas      1 wallclock secs ( 0.44 usr +  0.00 sys =  0.44 CPU) @ 4571.43/s (n=2000)
...
997     : names -> name  0 wallclock secs ( 0.46 usr +  0.00 sys =  0.46 CPU) @ 4338.98/s (n=2000)
998     : prescriptions -> prescript     1 wallclock secs ( 0.57 usr +  0.00 sys =  0.57 CPU) @ 3506.85/s (n=2000)
999     : eyestrings -> eyestr   1 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 4063.49/s (n=2000)
1000    : separates -> separ     0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 4063.49/s (n=2000)
Average random cross-sectional stem rate for 1000 words: 4985.59 Hz (n=2000000).
-- perl-bench.txt --
$ diff perl.txt perl-bench.txt
12a13,14
> use Benchmark;
> 
108,120c110,119
< while (<>)
< {
<    {  /^([^a-zA-Z]*)(.*)/ ;
<       print $1;
<       $_ = $2;
<       unless ( /^([a-zA-Z]+)(.*)/ ) { last; }
<       $word = lc $1; # turn to lower case before calling:
<       $_ = $2;
<       $word = stem($word);
<       print $word;
<       redo;
<    }
<    print "\n";
---
> 
> my @word = grep chomp, <>;
> @word = grep lc, @word;
> my ($n,$pu,$ps) = (0,0,0); my $s = 1000;
> for (1..$s) {
>   my $result;
>   my $w = @word[rand(scalar(@word))];
>   my $t = timeit(2000, sub { $result = stem($w) } );
>   print "$_\t: $w -> $result\t",timestr($t),"\n";
>   $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
121a121,122
> printf  "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n;
> 
-----------


-- porter.pm --
1       : whistles -> whistl     2 wallclock secs ( 1.59 usr +  0.00 sys =  1.59 CPU) @ 3137.25/s (n=5000)
2       : sear -> sear   1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 6213.59/s (n=5000)
3       : riseth -> riseth       1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 4812.03/s (n=5000)
4       : equal -> equal         1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 4776.12/s (n=5000)
5       : venue -> venu  2 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU) @ 3459.46/s (n=5000)
6       : cracked -> crack       2 wallclock secs ( 1.61 usr +  0.00 sys =  1.61 CPU) @ 3106.80/s (n=5000)
7       : marigold -> marigold   1 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 4539.01/s (n=5000)
8       : pimpernell -> pimpernel        2 wallclock secs ( 1.62 usr +  0.00 sys =  1.62 CPU) @ 3076.92/s (n=5000)
9       : respite -> respit      2 wallclock secs ( 1.40 usr +  0.00 sys =  1.40 CPU) @ 3575.42/s (n=5000)
10      : dispos -> dispo        1 wallclock secs ( 1.00 usr +  0.00 sys =  1.00 CPU) @ 5000.00/s (n=5000)
11      : nak -> nak     1 wallclock secs ( 0.76 usr +  0.00 sys =  0.76 CPU) @ 6597.94/s (n=5000)
12      : file -> file   1 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4050.63/s (n=5000)
13      : pageants -> pageant    2 wallclock secs ( 1.35 usr +  0.00 sys =  1.35 CPU) @ 3699.42/s (n=5000)
14      : regards -> regard      1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4885.50/s (n=5000)
15      : exceed -> exce         2 wallclock secs ( 1.68 usr +  0.00 sys =  1.68 CPU) @ 2976.74/s (n=5000)
16      : spiritual -> spiritu   2 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
17      : nothing -> noth        2 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 3018.87/s (n=5000)
18      : wake -> wake   1 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4076.43/s (n=5000)
19      : shrewishly -> shrewishli       2 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
20      : neglected -> neglect   2 wallclock secs ( 1.72 usr +  0.00 sys =  1.72 CPU) @ 2909.09/s (n=5000)
21      : untun -> untun         1 wallclock secs ( 0.94 usr +  0.00 sys =  0.94 CPU) @ 5333.33/s (n=5000)
22      : jaundice -> jaundic    2 wallclock secs ( 1.31 usr +  0.00 sys =  1.31 CPU) @ 3809.52/s (n=5000)
23      : pilfering -> pilfer    2 wallclock secs ( 1.98 usr +  0.00 sys =  1.98 CPU) @ 2519.69/s (n=5000)
24      : remark -> remark       1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5120.00/s (n=5000)
25      : palsies -> palsi       2 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 4776.12/s (n=5000)
26      : tributary -> tributari         1 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3615.82/s (n=5000)
27      : spare -> spare         2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3535.91/s (n=5000)
28      : prologue -> prologu    2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3555.56/s (n=5000)
29      : inheritance -> inherit         1 wallclock secs ( 1.58 usr +  0.00 sys =  1.58 CPU) @ 3168.32/s (n=5000)
30      : permit -> permit       1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5289.26/s (n=5000)
31      : exorciser -> exorcis   1 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
32      : spitting -> spit       0 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3555.56/s (n=5000)
33      : lofty -> lofti         2 wallclock secs ( 1.19 usr +  0.00 sys =  1.19 CPU) @ 4210.53/s (n=5000)
34      : name -> name   1 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4076.43/s (n=5000)
35      : lavender -> lavend     2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3535.91/s (n=5000)
36      : juliet -> juliet       1 wallclock secs ( 0.93 usr +  0.00 sys =  0.93 CPU) @ 5378.15/s (n=5000)
37      : allied -> alli         2 wallclock secs ( 1.57 usr +  0.00 sys =  1.57 CPU) @ 3184.08/s (n=5000)
38      : suppose -> suppos      1 wallclock secs ( 1.35 usr +  0.00 sys =  1.35 CPU) @ 3699.42/s (n=5000)
39      : variations -> variat   3 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 2539.68/s (n=5000)
40      : carelessness -> careless       2 wallclock secs ( 1.73 usr +  0.00 sys =  1.73 CPU) @ 2882.88/s (n=5000)
41      : mockery -> mockeri     1 wallclock secs ( 1.27 usr +  0.00 sys =  1.27 CPU) @ 3950.62/s (n=5000)
42      : actual -> actual       2 wallclock secs ( 1.29 usr +  0.00 sys =  1.29 CPU) @ 3878.79/s (n=5000)
43      : beldams -> beldam      1 wallclock secs ( 0.94 usr +  0.00 sys =  0.94 CPU) @ 5333.33/s (n=5000)
44      : tired -> tire  2 wallclock secs ( 1.75 usr +  0.00 sys =  1.75 CPU) @ 2857.14/s (n=5000)
45      : lym -> lym     1 wallclock secs ( 0.84 usr +  0.00 sys =  0.84 CPU) @ 5981.31/s (n=5000)
46      : bravely -> brave       2 wallclock secs ( 2.02 usr +  0.00 sys =  2.02 CPU) @ 2471.04/s (n=5000)
47      : unwish -> unwish       2 wallclock secs ( 1.01 usr +  0.00 sys =  1.01 CPU) @ 4961.24/s (n=5000)
48      : prizes -> prize        1 wallclock secs ( 1.55 usr +  0.00 sys =  1.55 CPU) @ 3232.32/s (n=5000)
49      : tackled -> tackl       0 wallclock secs ( 1.69 usr +  0.00 sys =  1.69 CPU) @ 2962.96/s (n=5000)
50      : antidote -> antidot    1 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU) @ 3440.86/s (n=5000)
51      : coarse -> coars        2 wallclock secs ( 1.53 usr +  0.00 sys =  1.53 CPU) @ 3265.31/s (n=5000)
52      : celebrates -> celebr   2 wallclock secs ( 1.48 usr +  0.00 sys =  1.48 CPU) @ 3368.42/s (n=5000)
53      : archbishop -> archbishop       1 wallclock secs ( 1.39 usr +  0.00 sys =  1.39 CPU) @ 3595.51/s (n=5000)
54      : oaten -> oaten         1 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU) @ 5663.72/s (n=5000)
55      : straiter -> straiter   1 wallclock secs ( 1.47 usr +  0.00 sys =  1.47 CPU) @ 3404.26/s (n=5000)
56      : unconfirmed -> unconfirm       2 wallclock secs ( 1.87 usr +  0.00 sys =  1.87 CPU) @ 2677.82/s (n=5000)
57      : banditto -> banditto   0 wallclock secs ( 1.13 usr +  0.00 sys =  1.13 CPU) @ 4413.79/s (n=5000)
58      : visited -> visit       2 wallclock secs ( 1.57 usr +  0.00 sys =  1.57 CPU) @ 3184.08/s (n=5000)
59      : imperfections -> imperfect     2 wallclock secs ( 1.81 usr +  0.00 sys =  1.81 CPU) @ 2758.62/s (n=5000)
60      : lieutenants -> lieuten         1 wallclock secs ( 1.40 usr +  0.00 sys =  1.40 CPU) @ 3575.42/s (n=5000)
61      : ponton -> ponton       1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 4776.12/s (n=5000)
62      : express -> express     1 wallclock secs ( 1.16 usr +  0.00 sys =  1.16 CPU) @ 4324.32/s (n=5000)
63      : intruding -> intrud    2 wallclock secs ( 1.73 usr +  0.00 sys =  1.73 CPU) @ 2882.88/s (n=5000)
64      : cures -> cure  2 wallclock secs ( 1.27 usr +  0.00 sys =  1.27 CPU) @ 3926.38/s (n=5000)
65      : oxford -> oxford       1 wallclock secs ( 0.97 usr +  0.00 sys =  0.97 CPU) @ 5161.29/s (n=5000)
66      : disclosed -> disclos   2 wallclock secs ( 1.98 usr +  0.00 sys =  1.98 CPU) @ 2529.64/s (n=5000)
67      : noise -> nois  2 wallclock secs ( 1.45 usr +  0.00 sys =  1.45 CPU) @ 3459.46/s (n=5000)
68      : ushering -> usher      2 wallclock secs ( 1.70 usr +  0.00 sys =  1.70 CPU) @ 2935.78/s (n=5000)
69      : cudgeled -> cudgel     2 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 3004.69/s (n=5000)
70      : medal -> medal         1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 4812.03/s (n=5000)
71      : enacts -> enact        1 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5245.90/s (n=5000)
72      : discoursed -> discours         3 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 2539.68/s (n=5000)
73      : barely -> bare         2 wallclock secs ( 1.80 usr +  0.00 sys =  1.80 CPU) @ 2782.61/s (n=5000)
74      : warning -> warn        2 wallclock secs ( 1.61 usr +  0.00 sys =  1.61 CPU) @ 3106.80/s (n=5000)
75      : yorick -> yorick       1 wallclock secs ( 0.93 usr +  0.00 sys =  0.93 CPU) @ 5378.15/s (n=5000)
76      : pregnancy -> pregnanc  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 CPU) @ 2310.47/s (n=5000)
77      : gleams -> gleam        1 wallclock secs ( 0.93 usr +  0.00 sys =  0.93 CPU) @ 5378.15/s (n=5000)
78      : unkindly -> unkindli   0 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU) @ 3832.34/s (n=5000)
79      : capels -> capel        1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5079.37/s (n=5000)
80      : broken -> broken       1 wallclock secs ( 0.93 usr +  0.00 sys =  0.93 CPU) @ 5378.15/s (n=5000)
81      : tenour -> tenour       2 wallclock secs ( 0.99 usr +  0.00 sys =  0.99 CPU) @ 5039.37/s (n=5000)
82      : untimely -> untim      2 wallclock secs ( 1.90 usr +  0.00 sys =  1.90 CPU) @ 2633.74/s (n=5000)
83      : endurance -> endur     2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3535.91/s (n=5000)
84      : furr -> furr   1 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU) @ 5663.72/s (n=5000)
85      : few -> few     1 wallclock secs ( 0.78 usr +  0.00 sys =  0.78 CPU) @ 6400.00/s (n=5000)
86      : flying -> fly  2 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 3018.87/s (n=5000)
87      : violent -> violent     1 wallclock secs ( 1.49 usr +  0.00 sys =  1.49 CPU) @ 3350.79/s (n=5000)
88      : somewhither -> somewhith       2 wallclock secs ( 1.64 usr +  0.00 sys =  1.64 CPU) @ 3047.62/s (n=5000)
89      : condemned -> condemn   2 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 2831.86/s (n=5000)
90      : enjoying -> enjoi      3 wallclock secs ( 1.80 usr +  0.00 sys =  1.80 CPU) @ 2770.56/s (n=5000)
91      : patches -> patch       1 wallclock secs ( 1.53 usr +  0.00 sys =  1.53 CPU) @ 3265.31/s (n=5000)
92      : vestal -> vestal       0 wallclock secs ( 1.31 usr +  0.00 sys =  1.31 CPU) @ 3809.52/s (n=5000)
93      : weeds -> weed  1 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 4848.48/s (n=5000)
94      : sensibly -> sensibl    3 wallclock secs ( 2.14 usr +  0.02 sys =  2.16 CPU) @ 2318.84/s (n=5000)
95      : incertainties -> incertainti   2 wallclock secs ( 1.26 usr +  0.00 sys =  1.26 CPU) @ 3975.16/s (n=5000)
96      : strangle -> strangl    2 wallclock secs ( 1.69 usr +  0.00 sys =  1.69 CPU) @ 2962.96/s (n=5000)
97      : whirlwinds -> whirlwind        2 wallclock secs ( 1.39 usr +  0.00 sys =  1.39 CPU) @ 3595.51/s (n=5000)
98      : surpassing -> surpass  2 wallclock secs ( 1.96 usr +  0.00 sys =  1.96 CPU) @ 2549.80/s (n=5000)
99      : picklock -> picklock   1 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 4848.48/s (n=5000)
100     : pretext -> pretext     1 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 4848.48/s (n=5000)
Average random cross-sectional stem rate for 100 words: 3618.07 Hz (n=500000).
-- porter.pm --
1       : baggage -> baggag     12 wallclock secs (11.95 usr +  0.00 sys = 11.95 CPU) @ 4183.01/s (n=50000)
2       : remarkable -> remark  15 wallclock secs (13.28 usr +  0.00 sys = 13.28 CPU) @ 3764.71/s (n=50000)
3       : boson -> boson        10 wallclock secs ( 8.77 usr +  0.00 sys =  8.77 CPU) @ 5699.02/s (n=50000)
4       : reynaldo -> reynaldo  10 wallclock secs ( 9.34 usr +  0.00 sys =  9.34 CPU) @ 5355.65/s (n=50000)
5       : warlike -> warlik     14 wallclock secs (12.09 usr +  0.01 sys = 12.10 CPU) @ 4131.70/s (n=50000)
6       : ambuscadoes -> ambuscado      17 wallclock secs (14.21 usr +  0.00 sys = 14.21 CPU) @ 3518.42/s (n=50000)
7       : title -> titl 16 wallclock secs (14.10 usr +  0.00 sys = 14.10 CPU) @ 3545.71/s (n=50000)
8       : doctors -> doctor     13 wallclock secs (10.71 usr +  0.00 sys = 10.71 CPU) @ 4668.13/s (n=50000)
9       : witches -> witch      18 wallclock secs (15.42 usr +  0.00 sys = 15.42 CPU) @ 3242.15/s (n=50000)
10      : weighed -> weigh      17 wallclock secs (15.34 usr +  0.00 sys = 15.34 CPU) @ 3258.66/s (n=50000)
Average random cross-sectional stem rate for 10 words: 3992.51 Hz (n=500000)
-- porter.pm --
1       : inlaid -> inlaid      16 wallclock secs (13.85 usr +  0.00 sys = 13.85 CPU) @ 7219.40/s (n=100000)
2       : contrite -> contrit   28 wallclock secs (25.52 usr +  0.01 sys = 25.53 CPU) @ 3916.77/s (n=100000)
3       : embrac -> embrac      14 wallclock secs (12.86 usr +  0.00 sys = 12.86 CPU) @ 7776.43/s (n=100000)
4       : emhracing -> emhrac   32 wallclock secs (27.19 usr +  0.00 sys = 27.19 CPU) @ 3678.16/s (n=100000)
5       : servanted -> servant  40 wallclock secs (34.54 usr +  0.00 sys = 34.54 CPU) @ 2895.27/s (n=100000)
6       : cataplasm -> cataplasm        20 wallclock secs (15.44 usr +  0.02 sys = 15.46 CPU) @ 6467.91/s (n=100000)
7       : uncoined -> uncoin    34 wallclock secs (27.62 usr +  0.03 sys = 27.66 CPU) @ 3615.82/s (n=100000)
8       : crowkeeper -> crowkeep        27 wallclock secs (22.81 usr +  0.02 sys = 22.83 CPU) @ 4380.56/s (n=100000)
9       : leaven -> leaven      19 wallclock secs (16.48 usr +  0.00 sys = 16.48 CPU) @ 6066.35/s (n=100000)
10      : speech -> speech      15 wallclock secs (13.70 usr +  0.00 sys = 13.70 CPU) @ 7297.61/s (n=100000)
Average random cross-sectional stem rate for 10 words: 4759.60 Hz (n=1000000).
-- porter.pm --
1       : appeared -> appear     0 wallclock secs ( 0.52 usr +  0.00 sys =  0.52 CPU) @ 3820.90/s (n=2000)
2       : andirons -> andiron    1 wallclock secs ( 0.43 usr +  0.00 sys =  0.43 CPU) @ 4654.55/s (n=2000)
3       : art -> art     0 wallclock secs ( 0.31 usr +  0.00 sys =  0.31 CPU) @ 6400.00/s (n=2000)
4       : greeks -> greek        1 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU) @ 5120.00/s (n=2000)
5       : unmusical -> unmus     0 wallclock secs ( 0.66 usr +  0.00 sys =  0.66 CPU) @ 3047.62/s (n=2000)
6       : executor -> executor   1 wallclock secs ( 0.38 usr +  0.00 sys =  0.38 CPU) @ 5224.49/s (n=2000)
7       : cetera -> cetera       0 wallclock secs ( 0.35 usr +  0.00 sys =  0.35 CPU) @ 5688.89/s (n=2000)
8       : depositaries -> depositari     1 wallclock secs ( 0.46 usr +  0.00 sys =  0.46 CPU) @ 4338.98/s (n=2000)
9       : intellectual -> intellectu     1 wallclock secs ( 0.70 usr +  0.00 sys =  0.70 CPU) @ 2876.40/s (n=2000)
10      : road -> road   0 wallclock secs ( 0.30 usr +  0.00 sys =  0.30 CPU) @ 6736.84/s (n=2000)
...
983     : flaring -> flare       1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 2485.44/s (n=2000)  !!!
994     : barnacles -> barnacl   1 wallclock secs ( 0.60 usr +  0.00 sys =  0.60 CPU) @ 3324.68/s (n=2000)
996     : swag -> swag   0 wallclock secs ( 0.30 usr +  0.00 sys =  0.30 CPU) @ 6564.10/s (n=2000)
997     : film -> film   0 wallclock secs ( 0.33 usr +  0.00 sys =  0.33 CPU) @ 6095.24/s (n=2000)
998     : quests -> quest        1 wallclock secs ( 0.42 usr +  0.00 sys =  0.42 CPU) @ 4740.74/s (n=2000)
999     : crests -> crest        0 wallclock secs ( 0.41 usr +  0.00 sys =  0.41 CPU) @ 4830.19/s (n=2000)
1000    : audre -> audr  1 wallclock secs ( 0.57 usr +  0.00 sys =  0.57 CPU) @ 3506.85/s (n=2000)
Average random cross-sectional stem rate for 1000 words: 4082.41 Hz (n=2000000).
-- bench-porter.pm.pl --
#!/usr/bin/perl
require "./porter.pm";
use Benchmark;

my @word = grep chomp, <>;
my ($n,$pu,$ps) = (0,0,0); my $s = 100;
for (1..$s) {
  my $result;
  my $w = @word[rand(scalar(@word))];
  my $t = timeit(5000, sub { $result = porter($w) } );
  print "$_\t: $w -> $result\t",timestr($t),"\n";
  $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
}
printf  "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n;
-----------


-- stem.pl/Text::English --
1       : helena -> helena       1 wallclock secs ( 1.01 usr +  0.00 sys =  1.01 CPU) @ 4961.24/s (n=5000)
2       : sallies -> sally       2 wallclock secs ( 1.41 usr +  0.00 sys =  1.41 CPU) @ 3555.56/s (n=5000)
3       : conducting -> conduct  1 wallclock secs ( 1.42 usr +  0.00 sys =  1.42 CPU) @ 3516.48/s (n=5000)
4       : turpitude -> turpitud  2 wallclock secs ( 1.09 usr +  0.00 sys =  1.09 CPU) @ 4604.32/s (n=5000)
5       : velutus -> velutu      1 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4444.44/s (n=5000)
6       : incestuous -> incestu  2 wallclock secs ( 1.32 usr +  0.00 sys =  1.32 CPU) @ 3786.98/s (n=5000)
7       : rivers -> river        1 wallclock secs ( 1.21 usr +  0.00 sys =  1.21 CPU) @ 4129.03/s (n=5000)
8       : ear -> ear     1 wallclock secs ( 0.82 usr +  0.00 sys =  0.82 CPU) @ 6095.24/s (n=5000)
9       : cowslips -> cowslip    1 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4444.44/s (n=5000)
10      : mir -> mir     1 wallclock secs ( 0.80 usr +  0.00 sys =  0.80 CPU) @ 6274.51/s (n=5000)
11      : robas -> roba  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 4812.03/s (n=5000)
12      : student -> student     1 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 4155.84/s (n=5000)
13      : religiously -> religy  2 wallclock secs ( 1.75 usr +  0.00 sys =  1.75 CPU) @ 2857.14/s (n=5000)
14      : sty -> sty     1 wallclock secs ( 0.82 usr +  0.00 sys =  0.82 CPU) @ 6095.24/s (n=5000)
15      : epistrophus -> epistrophu      2 wallclock secs ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 4353.74/s (n=5000)
16      : defunct -> defunct     1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5079.37/s (n=5000)
17      : compell -> compel      1 wallclock secs ( 1.14 usr +  0.00 sys =  1.14 CPU) @ 4383.56/s (n=5000)
18      : lovely -> love         2 wallclock secs ( 1.42 usr +  0.00 sys =  1.42 CPU) @ 3516.48/s (n=5000)
19      : sycorax -> sycorax     1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5120.00/s (n=5000)
20      : jewel -> jewel         1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5079.37/s (n=5000)
21      : patient -> paty        2 wallclock secs ( 1.36 usr +  0.00 sys =  1.36 CPU) @ 3678.16/s (n=5000)
22      : wish -> wish   1 wallclock secs ( 0.90 usr +  0.00 sys =  0.90 CPU) @ 5565.22/s (n=5000)
23      : tarquins -> tarquin    1 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 4444.44/s (n=5000)
24      : sharded -> shard       2 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3636.36/s (n=5000)
25      : compelled -> compel    2 wallclock secs ( 1.58 usr +  0.00 sys =  1.58 CPU) @ 3168.32/s (n=5000)
26      : starved -> starv       2 wallclock secs ( 1.38 usr +  0.00 sys =  1.38 CPU) @ 3636.36/s (n=5000)
27      : starveth -> starveth   1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4923.08/s (n=5000)
28      : shapen -> shapen       1 wallclock secs ( 1.01 usr +  0.00 sys =  1.01 CPU) @ 4961.24/s (n=5000)
29      : unforc -> unforc       1 wallclock secs ( 0.96 usr +  0.00 sys =  0.96 CPU) @ 5203.25/s (n=5000)
30      : tart -> tart   1 wallclock secs ( 0.89 usr +  0.00 sys =  0.89 CPU) @ 5614.04/s (n=5000)
...
93      : cashier -> cashy       1 wallclock secs ( 1.30 usr +  0.00 sys =  1.30 CPU) @ 3855.42/s (n=5000)
94      : reacheth -> reacheth   1 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 4885.50/s (n=5000)
95      : prosecution -> prosecut        2 wallclock secs ( 1.23 usr +  0.00 sys =  1.23 CPU) @ 4076.43/s (n=5000)
96      : engross -> engross     1 wallclock secs ( 1.00 usr +  0.00 sys =  1.00 CPU) @ 5000.00/s (n=5000)
97      : anthems -> anthem      1 wallclock secs ( 1.15 usr +  0.00 sys =  1.15 CPU) @ 4353.74/s (n=5000)
98      : could -> could         0 wallclock secs ( 0.95 usr +  0.00 sys =  0.95 CPU) @ 5289.26/s (n=5000)
99      : undeck -> undeck       1 wallclock secs ( 1.00 usr +  0.00 sys =  1.00 CPU) @ 5000.00/s (n=5000)
100     : across -> across       1 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 5079.37/s (n=5000)
Average random cross-sectional stem rate for 100 words: 4254.75 Hz (n=500000).
-- stem.pl/Text::English --
1       : mother -> mother      12 wallclock secs (11.31 usr +  0.00 sys = 11.31 CPU) @ 4419.89/s (n=50000)
2       : violet -> violet      18 wallclock secs (10.07 usr +  0.01 sys = 10.08 CPU) @ 4961.24/s (n=50000)
3       : adriano -> adriano    18 wallclock secs (10.10 usr +  0.02 sys = 10.12 CPU) @ 4942.08/s (n=50000)
4       : grizzle -> grizzl     18 wallclock secs (11.30 usr +  0.00 sys = 11.30 CPU) @ 4426.00/s (n=50000)
5       : eel -> eel    15 wallclock secs ( 8.34 usr +  0.00 sys =  8.34 CPU) @ 5998.13/s (n=50000)
6       : felonious -> felony   17 wallclock secs (14.95 usr +  0.01 sys = 14.95 CPU) @ 3343.78/s (n=50000)
7       : goldsmith -> goldsmith        11 wallclock secs ( 9.98 usr +  0.00 sys =  9.98 CPU) @ 5007.82/s (n=50000)
8       : sepulchring -> sepulchr       15 wallclock secs (14.25 usr +  0.00 sys = 14.25 CPU) @ 3508.77/s (n=50000)
9       : justle -> justl       13 wallclock secs (10.95 usr +  0.00 sys = 10.95 CPU) @ 4564.91/s (n=50000)
10      : suggested -> suggest  17 wallclock secs (14.45 usr +  0.00 sys = 14.45 CPU) @ 3461.33/s (n=50000)
Average random cross-sectional stem rate for 10 words: 4320.53 Hz (n=500000).
-- stem.pl/Text::English --
1       : limander -> limand    33 wallclock secs (22.02 usr +  0.01 sys = 22.03 CPU) @ 4539.01/s (n=100000)
2       : sale -> sale  28 wallclock secs (19.95 usr +  0.00 sys = 19.95 CPU) @ 5011.75/s (n=100000)
3       : blackberries -> blackberry    52 wallclock secs (28.19 usr +  0.04 sys = 28.23 CPU) @ 3542.76/s (n=100000)
4       : tarquin -> tarquin    26 wallclock secs (19.05 usr +  0.02 sys = 19.07 CPU) @ 5243.75/s (n=100000)
5       : unless -> unless      25 wallclock secs (20.72 usr +  0.01 sys = 20.73 CPU) @ 4824.73/s (n=100000)
6       : rascally -> rascal    45 wallclock secs (27.67 usr +  0.02 sys = 27.70 CPU) @ 3610.72/s (n=100000)
7       : carelessness -> careless      29 wallclock secs (26.04 usr +  0.00 sys = 26.04 CPU) @ 3840.38/s (n=100000)
8       : assubjugate -> assubjug       26 wallclock secs (23.68 usr +  0.00 sys = 23.68 CPU) @ 4223.03/s (n=100000)
9       : thorny -> thorny      31 wallclock secs (27.62 usr +  0.01 sys = 27.62 CPU) @ 3619.91/s (n=100000)
10      : trespasses -> trespass        26 wallclock secs (22.01 usr +  0.00 sys = 22.01 CPU) @ 4543.84/s (n=100000)
Average random cross-sectional stem rate for 10 words: 4218.44 Hz (n=1000000).
-- stem.pl/Text::English --
1       : misbhav -> misbhav     0 wallclock secs ( 0.38 usr +  0.00 sys =  0.38 CPU) @ 5333.33/s (n=2000)
2       : insinuateth -> insinuateth     1 wallclock secs ( 0.42 usr +  0.00 sys =  0.42 CPU) @ 4740.74/s (n=2000)
3       : never -> never         1 wallclock secs ( 0.43 usr +  0.01 sys =  0.44 CPU) @ 4571.43/s (n=2000)
4       : surveyor -> surveyor   1 wallclock secs ( 0.42 usr +  0.00 sys =  0.42 CPU) @ 4740.74/s (n=2000)
5       : tir -> tir     0 wallclock secs ( 0.32 usr +  0.00 sys =  0.32 CPU) @ 6243.90/s (n=2000)
6       : slumbers -> slumber    1 wallclock secs ( 0.52 usr +  0.00 sys =  0.52 CPU) @ 3878.79/s (n=2000)
7       : gallus -> gallu        1 wallclock secs ( 0.44 usr +  0.00 sys =  0.44 CPU) @ 4571.43/s (n=2000)
...
993     : contemptible -> contempt       1 wallclock secs ( 0.48 usr +  0.00 sys =  0.48 CPU) @ 4196.72/s (n=2000)
994     : sensual -> sensu       0 wallclock secs ( 0.41 usr +  0.00 sys =  0.41 CPU) @ 4830.19/s (n=2000)
995     : jeer -> jeer   1 wallclock secs ( 0.39 usr +  0.00 sys =  0.39 CPU) @ 5120.00/s (n=2000)
996     : holden -> holden       0 wallclock secs ( 0.41 usr +  0.00 sys =  0.41 CPU) @ 4923.08/s (n=2000)
997     : weakling -> weakl      1 wallclock secs ( 0.55 usr +  0.00 sys =  0.55 CPU) @ 3657.14/s (n=2000)
998     : cormorant -> cormor    1 wallclock secs ( 0.45 usr +  0.00 sys =  0.45 CPU) @ 4491.23/s (n=2000)
999     : affianc -> affianc     0 wallclock secs ( 0.38 usr +  0.00 sys =  0.38 CPU) @ 5224.49/s (n=2000)
1000    : fastolfe -> fastolf    1 wallclock secs ( 0.44 usr +  0.00 sys =  0.44 CPU) @ 4571.43/s (n=2000)
Average random cross-sectional stem rate for 1000 words: 4384.46 Hz (n=2000000).
-- bench-stem.pl.pl --
#!/usr/bin/perl
require "./stem.pl";
#use Text::English;  # The same thing
use Benchmark;

my @word = grep chomp, <>;
my ($n,$pu,$ps) = (0,0,0); my $s = 100;
for (1..$s) {
  my $result;
  my $w = @word[rand(scalar(@word))];
  my $t = timeit(2000, sub { ($result) = stem($w) } );
  print "$_\t: $w -> $result\t",timestr($t),"\n";
  $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
}
printf  "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n;
-----------

-- Lingua::Stem --
(Like I suspected -- all that subroutine and reference overhead really bogs this one down..
Sorry to the authors, but I could really see that coming performance wise :-/, feature wise yours is the best! )

1       : rowel -> rowel         3 wallclock secs ( 2.01 usr +  0.00 sys =  2.01 CPU) @ 2490.27/s (n=5000)
2       : fantasies -> fantasi   2 wallclock secs ( 2.24 usr +  0.00 sys =  2.24 CPU) @ 2229.97/s (n=5000)
3       : bud -> bud     2 wallclock secs ( 1.97 usr +  0.00 sys =  1.97 CPU) @ 2539.68/s (n=5000)
4       : ages -> ag     2 wallclock secs ( 2.48 usr +  0.00 sys =  2.48 CPU) @ 2012.58/s (n=5000)
5       : locking -> lock        3 wallclock secs ( 2.28 usr +  0.00 sys =  2.28 CPU) @ 2191.78/s (n=5000)
...
97      : conveniently -> conveni        2 wallclock secs ( 2.93 usr +  0.00 sys =  2.93 CPU) @ 1706.67/s (n=5000)
98      : spiders -> spider      3 wallclock secs ( 2.38 usr +  0.00 sys =  2.38 CPU) @ 2098.36/s (n=5000)
99      : tu -> tu       2 wallclock secs ( 1.92 usr +  0.00 sys =  1.92 CPU) @ 2601.63/s (n=5000)
100     : beadle -> beadl        4 wallclock secs ( 2.40 usr +  0.00 sys =  2.40 CPU) @ 2084.69/s (n=5000)
Average random cross-sectional stem rate for 100 words: 2233.86 Hz (n=500000).

-- Lingua::Stem --
1       : candidatus -> candidatu       24 wallclock secs (22.45 usr +  0.00 sys = 22.45 CPU) @ 2227.64/s (n=50000)
2       : desiring -> desir     29 wallclock secs (23.02 usr +  0.02 sys = 23.04 CPU) @ 2170.23/s (n=50000)
3       : sore -> sore  30 wallclock secs (20.47 usr +  0.00 sys = 20.47 CPU) @ 2442.75/s (n=50000)
4       : nuncio -> nuncio      28 wallclock secs (19.89 usr +  0.02 sys = 19.91 CPU) @ 2511.77/s (n=50000)
5       : wreaks -> wreak       29 wallclock secs (22.45 usr +  0.06 sys = 22.52 CPU) @ 2220.68/s (n=50000)
6       : fans -> fan   32 wallclock secs (22.41 usr +  0.02 sys = 22.44 CPU) @ 2228.41/s (n=50000)
7       : deem -> deem  26 wallclock secs (19.77 usr +  0.00 sys = 19.77 CPU) @ 2529.64/s (n=50000)
8       : paphos -> papho       30 wallclock secs (22.51 usr +  0.00 sys = 22.51 CPU) @ 2221.45/s (n=50000)
9       : promis -> promi       29 wallclock secs (22.59 usr +  0.01 sys = 22.59 CPU) @ 2213.00/s (n=50000)
10      : smoky -> smoki        29 wallclock secs (22.24 usr +  0.00 sys = 22.24 CPU) @ 2247.98/s (n=50000)
Average random cross-sectional stem rate for 10 words: 2294.40 Hz (n=500000).
-- Lingua::Stem --
1       : asher -> asher        62 wallclock secs (41.34 usr +  0.05 sys = 41.38 CPU) @ 2416.46/s (n=100000)
2       : learns -> learn       61 wallclock secs (44.96 usr +  0.02 sys = 44.98 CPU) @ 2222.99/s (n=100000)
3       : forswearing -> forswear       65 wallclock secs (46.13 usr +  0.02 sys = 46.15 CPU) @ 2166.92/s (n=100000)
4       : theatre -> theatr     72 wallclock secs (46.54 usr +  0.01 sys = 46.55 CPU) @ 2148.37/s (n=100000)
5       : corpse -> corps       69 wallclock secs (48.18 usr +  0.02 sys = 48.20 CPU) @ 2074.89/s (n=100000)
6       : copied -> copi        53 wallclock secs (45.23 usr +  0.02 sys = 45.25 CPU) @ 2209.94/s (n=100000)
7       : cogging -> cog        67 wallclock secs (47.83 usr +  0.02 sys = 47.85 CPU) @ 2089.80/s (n=100000)
8       : absolute -> absolut   57 wallclock secs (46.70 usr +  0.00 sys = 46.70 CPU) @ 2141.54/s (n=100000)
9       : forswearing -> forswear       56 wallclock secs (45.98 usr +  0.00 sys = 45.98 CPU) @ 2175.02/s (n=100000)
10      : withering -> wither   62 wallclock secs (48.95 usr +  0.02 sys = 48.96 CPU) @ 2042.44/s (n=100000)
Average random cross-sectional stem rate for 10 words: 2164.54 Hz (n=1000000).
-- Lingua::Stem --
... 
-- bench-lingua-stem.pl --
#!/usr/bin/perl
use Lingua::Stem qw(:all);
use Benchmark;

my @word = grep chomp, <>;
my ($n,$pu,$ps) = (0,0,0); my $s = 10;
for (1..$s) {
  my $result;
  my $w = @word[rand(scalar(@word))];
  my $t = timeit(100000, sub { ($result) = @{stem($w)} } );
  print "$_\t: $w -> $result\t",timestr($t),"\n";
  $pu+=$t->[1]; $ps+=$t->[2]; $n+=$t->[5];
}
printf  "Average random cross-sectional stem rate for $s words: %5.2f Hz (n=%d).\n", $n/($pu+$ps), $n;
-----------


	-- Allan Fields

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss