[Snowball-discuss] Optimising among
Richard Boulton
richard at lemurconsulting.com
Fri Sep 15 10:55:16 BST 2006
On Fri, Sep 15, 2006 at 05:00:47AM +0100, Olly Betts wrote:
> Implementing all the above takes the reduction in running time for
> the English stemmer to 46%!
...
> I'd be interested to hear how this benchmarks when measuring real time
> rather than running under cachegrind.
I get less than a 46% change, but still a good improvement:
The following are runs of the stemmers on the voc.txt files (which had
first been converted to UTF-8) repeated 10 times (by modifying stemwords.c
to call the stemmer 10 times for each word). First, both algorithms were
run through the file, and then the runs were repeated using "time" to
measure their speeds. The times appear to be stable to within about 10ms.
This indicates that some algorithms (such as finnish and german) get little
improvement, whereas others (such as dutch, hungarian, norwegian,
portuguese) get an improvement of between 40 and 50%. All algorithms got
faster, and english got about a 25% improvement.
(For reference, the command I used to get these was:
echo "lang,patched,real,user,sys";for lang in danish dutch english finnish
french german hungarian italian norwegian porter portuguese russian spanish
swedish; do lang=`basename $lang`; ./stemwords.olly -l ${lang} -i
../data/$lang/voc.txt -c UTF_8 -o tmp ; ./stemwords.orig -l ${lang} -i
../data/$lang/voc.txt -c UTF_8 -o tmp; echo -n "$lang,y,"; /usr/bin/time -f
"%e,%U,%S" ./stemwords.olly -l ${lang} -i ../data/$lang/voc.txt -c UTF_8 -o
tmp ; echo -n "$lang,n,";/usr/bin/time -f "%e,%U,%S" ./stemwords.orig -l
${lang} -i ../data/$lang/voc.txt -c UTF_8 -o tmp; done;
and my machine is a single processor, AMD Athlon 1200MHz
)
lang,patched,real,user,sys
danish,y,0.26,0.25,0.00
danish,n,0.32,0.30,0.00
dutch,y,1.39,1.36,0.02
dutch,n,2.61,2.58,0.01
english,y,0.73,0.72,0.00
english,n,0.96,0.95,0.00
finnish,y,1.15,1.15,0.00
finnish,n,1.20,1.17,0.02
french,y,0.88,0.88,0.00
french,n,1.02,1.02,0.00
german,y,1.45,1.42,0.02
german,n,1.51,1.48,0.01
hungarian,y,0.38,0.37,0.00
hungarian,n,0.65,0.64,0.00
italian,y,1.18,1.16,0.01
italian,n,1.90,1.81,0.00
norwegian,y,0.17,0.18,0.00
norwegian,n,0.31,0.23,0.00
porter,y,0.65,0.63,0.01
porter,n,0.84,0.75,0.00
portuguese,y,0.74,0.74,0.00
portuguese,n,1.27,1.19,0.00
russian,y,0.78,0.77,0.01
russian,n,0.97,0.88,0.01
spanish,y,0.59,0.58,0.01
spanish,n,0.90,0.82,0.00
swedish,y,0.29,0.28,0.01
swedish,n,0.42,0.34,0.00
--
Richard
More information about the Snowball-discuss
mailing list