[Snowball-discuss] Porter 1 perl.txt rehash

Allan Fields afieldscom@idirect.ca
Fri, 19 Apr 2002 06:32:16 -0400


--------------Boundary-00=_S99T4OWJ901G6SPL3SMO
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

Hi,

Here are a few ideas I've come across for perl.txt (Porter 1 for Perl).  Let 
me know what you think of it.  I might be interested to make Porter 2 for 
Perl or help with a snowball -> perl converter that uses an optimal 
algorithm/regular expressions.  With the Perl interface to snowball, I'm not 
certain this will be highly necessary...  However, it could help to have a 
Porter 2 reference stemmer for Perl.  Hopefully all of the previously 
enumerated stemmers can improve on their code to see that it all works as the 
algorithm dictates but some might have other aims such as a more closely 
formed stem to dictionary words.  stem.pl/Text::English uses i$ -> y$ as a 
nicification for example.

Allan Fields




--------------Boundary-00=_S99T4OWJ901G6SPL3SMO
Content-Type: application/x-perl; name="perl-mod.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="perl-mod.txt"

IyEvdXNyL2Jpbi9wZXJsIC13CnVzZSBzdHJpY3Q7CnBhY2thZ2UgTGluZ3VhOjpTdGVtOjpSZWZl
cmVuY2U6OlBvcnRlcjE7CiN1c2UgQmVuY2htYXJrIHF3KHRpbWVpdCB0aW1lc3RyICk7Cgo9cG9k
Cgo9aGVhZDEgTkFNRQoKcGVybC50eHQgLSBNYXRpbiBQb3J0ZXIncyBTdGVtbWVyIFJlbGVhc2Ug
MSBmb3IgUGVybAoKPWhlYWQxIERFU0NSSVBUSU9OCgpQb3J0ZXIgc3RlbW1lciBpbiBQZXJsLiAg
RWFzeSB0byBmb2xsb3cgYWdhaW5zdCB0aGUgcnVsZXMgaW4gdGhlIG9yaWdpbmFsIHBhcGVyLCBh
bmQgc3Vic2VxdWVudCBjaGFuZ2VzIHRvIHJlbGVhc2UgMSBhdmFpbGFibGUgb24gdGhlIHdlYnNp
dGUgbGlzdGVkIGJlbG93LgoKSW5wdXRzIHRha2VuIGZyb20gdGhlIGZpbGVzIG9uIHRoZSBhcmcg
bGlzdCwgb3V0cHV0IHRvIHN0ZG91dC4KCkFzIGFuIGVhc3kgc3BlZWQtdXAsIG9uZSBtaWdodCBj
cmVhdGUgYSBoYXNoIG9mIHdvcmQ9PnN0ZW1tZWQgZm9ybSwgYW5kIGxvb2sgdXAgZWFjaCBuZXcg
d29yZCBpbiB0aGUgaGFzaCwgb25seSBjYWxsaW5nIHN0ZW0oKSBpZiB0aGUgd29yZCB3YXMgbm90
IGZvdW5kIHRoZXJlLgoKVGhlcmUgYXJlIG1hbnkgUGVybCBzdGVtbWVycyBhdmFpbGFibGUgYXQg
dGhpcyB0aW1lIGluIG1vZHVsZSBmb3JtLCBhbHRob3VnaCBzb21lIGhhdmUgdmFyeWluZyBkZWdy
ZWVzIG9mIGFjY3VyYWN5LiAgVGhpcyBzdGVtbWVyIGlzIG1lYW50IHRvIGJlIGEgcmVmZXJlbmNl
IGltcGxlbWVudGF0aW9uIG9mIHRoZSBQb3J0ZXIgU3RlbW1lciBSZWxlYXNlIDEuCgo9aGVhZDEg
U1lOT1BTSVMKCkhlcmUgYXJlIHNvbWUgZXhhbXBsZXM6CiAgJCAjIFRyeSB0ZXN0IGRpY3Rpb25h
cnkgKGF2YWlsYWJsZSBmcm9tIHdlYnNpdGUpOgogICQgcGVybCBwZXJsLnR4dCA8dm9jLnR4dCA+
dGVzdC50eHQKICAkIGRpZmYgb3V0cHV0LnR4dCB0ZXN0LnR4dAogICQKICAkICMgSW50ZXJhY3Rp
dmUgdGVzdDoKICAkIHBlcmwgcGVybC50eHQKICBiYWtpbmcKICBiYWtlXkQKICAkCgo9aGVhZDEg
QVVUSE9SCgpNYXJ0aW4gUG9ydGVyIDxtYXJ0aW5AdGFydHVzLm9yZz4uCk1vZGlmaWVkIFBlcmwg
c2NyaXB0IGJ5IEFsbGFuIEZpZWxkcyA8YWZpZWxkc2NvbUBpZGlyZWN0LmNhPi4KCj1oZWFkMSBD
T1BZUklHSFQKCkNvcHlyaWdodCAoQykgTWFydGluIFBvcnRlciwgMTk4MC4KCj1jdXQKCgoKIyBT
b21lIHN0ZW1tZXIgZGVmaW5pdGlvbiwgc2V0IHRoaW5ncyB1cDoKbXkgJFZBTElECT0gICdbOmFs
cGhhOl0nOwkJCSMgZ3JhYiBhbHBoYXMgZnJvbSBpbnB1dApteSAkYwkJPSAgJ1teYWVpb3VdJzsJ
CQkjIGNvbnNvbmFudApteSAkdgkJPSAgJ1thZWlvdXldJzsJCQkjIHZvd2VsCm15ICRDCQk9ICAk
YyAuICdbXmFlaW91eV0qJzsJCSMgY29uc29uYW50IHNlcXVlbmNlCm15ICRWCQk9ICAkdiAuICdb
YWVpb3VdKic7CQkjIHZvd2VsIHNlcXVlbmNlCm15ICRtZ3IwCT1xcnteICg/OiRDKT8gJFYgJEMJ
CX14OwkjIFtDXVZDLi4uIGlzIG0+MApteSAkbWVxMQk9cXJ7XiAoPzokQyk/ICRWICRDICgkVik/
ICAgICAgJH14OwkjIFtDXVZDW1ZdIGlzIG09MQpteSAkbWdyMQk9cXJ7XiAoPzokQyk/ICRWICRD
ICAkViAgICRDCX14OwkjIFtDXVZDVkMuLi4gaXMgbT4xCm15ICRfdgkJPXFye14gKD86JEMpPyAk
dgkJfXg7CSMgdm93ZWwgaW4gc3RlbQoKbXkgJXN0ZXAybGlzdCA9ICgKICAgYXRpb25hbAk9PiAn
YXRlJywKICAgdGlvbmFsCT0+ICd0aW9uJywKICAgZW5jaQkJPT4gJ2VuY2UnLAogICBhbmNpCQk9
PiAnYW5jZScsCiAgIGl6ZXIJCT0+ICdpemUnLAogICBibGkJCT0+ICdibGUnLAogICBhbGxpCQk9
PiAnYWwnLAogICBlbnRsaQk9PiAnZW50JywKICAgZWxpCQk9PiAnZScsCiAgIG91c2xpCT0+ICdv
dXMnLAogICBpemF0aW9uCT0+ICdpemUnLAogICBhdGlvbgk9PiAnYXRlJywKICAgYXRvcgkJPT4g
J2F0ZScsCiAgIGFsaXNtCT0+ICdhbCcsCiAgIGl2ZW5lc3MJPT4gJ2l2ZScsCiAgIGZ1bG5lc3MJ
PT4gJ2Z1bCcsCiAgIG91c25lc3MJPT4gJ291cycsCiAgIGFsaXRpCT0+ICdhbCcsCiAgIGl2aXRp
CT0+ICdpdmUnLAogICBiaWxpdGkJPT4gJ2JsZScsCiAgIGxvZ2kJCT0+ICdsb2cnCik7Cm15ICRT
VEVQMgk9ICAnKCcgLiAoam9pbiAnfCcsIHNvcnQga2V5cyAlc3RlcDJsaXN0KSAuICcpJzsKCm15
ICVzdGVwM2xpc3QgPSAoCiAgIGljYXRlCT0+ICdpYycsCiAgIGF0aXZlCT0+ICcnLAogICBhbGl6
ZQk9PiAnYWwnLAogICBpY2l0aQk9PiAnaWMnLAogICBpY2FsCQk9PiAnaWMnLAogICBmdWwJCT0+
ICcnLAogICBuZXNzCQk9PiAnJwopOwpteSAkU1RFUDMJPSAgJygnIC4gKGpvaW4gJ3wnLCBzb3J0
IGtleXMgJXN0ZXAzbGlzdCkgLiAnKSc7CgpteSBAc3RlcDRsaXN0CT1xdygKICAgYWwgYW5jZSBl
bmNlIGVyIGljCiAgIGFibGUgaWJsZQogICBhbnQgZW1lbnQgbWVudAogICBlbnQgb3UgaXNtCiAg
IGF0ZSBpdGkgb3VzCiAgIGl2ZSBpemUKKTsKbXkgJFNURVA0CT0gICcoJyAuIChqb2luICd8Jywg
c29ydCBAc3RlcDRsaXN0KSAuICcpJzsKI215ICRTVEVQNAk9ICAnKCcgLiAoam9pbiAnfCcsIEBz
dGVwNGxpc3QpIC4gJyknOwkJIyA8LS0gVXNlIHRoaXMgaW5zdGVhZCBpZiB0aGlzIGxpc3QgaXMg
cHJlb3JkZXIgZm9yIG9wdGltdW0gc2VhcmNoIHNwZWVkLCBzZWVtcyBsb2dpY2FsIHRvIG9yZGVy
IGFscGhhYmV0aWNhbGx5IG9yIGV2ZW4gcmVnZXggY29tcHJlc3MsIGJ1dCBJJ20gdG9vIGxhenkg
LS0gRWc6CiMgLS0+ICRTVEVQNCA9IHFyKCBhKD86YmxlfGx8bmNlfHRlKSB8IGkoPzpibGV8Y3xz
bXx0aXx2ZXx6ZSkgLi4uICl4OwoKCgojIFRoZW4gZGVmaW5lIHN0ZW0oJHdvcmQpIHRvIHN0ZW0g
JHdvcmQ6CnN1YiBzdGVtIHsKICAgbXkgKCR3KTsKCiAgICM9PSBIYW5kbGUgYXJyYXlzLCByZWZl
cmVuY2VzLCBldGMuOiA9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0jCiAg
ICMgLSBTaG91bGRuJ3QgYWRkIHNpZ25pZmljYW50IG92ZXJoZWFkCiAgICMgLSBPdmVyaGVhZCBz
aG91bGQgYmUgbW9yZSB0aGFuIG1hZGUgdXAgZm9yIHdoZW4gY2FsbGluZyB3aXRoIGxvbmcgbGlz
dHMKICAgIyAtIFBsZWFzZSBpbmZvcm0gb2YgaXNzdWVzIHdpdGggdGhpcyBoYW5kbGVyCiAgIG15
IEB3b3JkID0gKCk7CiAgIGZvcmVhY2ggKEBfKSB7CiAgICAgIGlmIChteSAkcmVmID0gcmVmKSB7
CiAgICAgICAgIGlmICgkcmVmIGVxICdBUlJBWScpIHsJCQkjIDI6IFNlY29uZCBtb3N0IGxpa2Vs
eSBjYXNlOiBBcnJheSBSZWYKICAgICAgICAgICAgZm9yZWFjaCAkdyhAeyRffSkgewogICAgICAg
ICAgICAgICBpZiAobm90IHJlZiAkdykgewogICAgICAgICAgICAgICAgICBwdXNoIEB3b3JkLCBs
YyAkdzsKICAgICAgICAgICAgICAgfSBlbHNlIHsKICAgICAgICAgICAgICAgICAgc3RlbSgkdyk7
CQkJIyA0OiBEZWVwIHBsYWNlZCByZWZlcmVuY2UsIGdvIGRlZXAKICAgICAgICAgICAgICAgICAg
d2FybiAiRGVlcCByZWZlcmVuY2UgaW4gc3RlbSwgZ29pbmcgZGVlcC4uLiI7CiAgICAgICAgICAg
ICAgIH0KICAgICAgICAgICAgfQogICAgICAgICB9IGVsc2lmICgkcmVmIGVxICdTQ0FMQVInKSB7
CiAgICAgICAgICAgIHB1c2ggQHdvcmQsIGxjICR7JF99OwkJIyAzOiBSZWZlcmVuY2UgdG8gc2Nh
bGFyCiAgICAgICAgIH0gZWxzZSB7CiAgICAgICAgICAgIGRpZSAiVW5zdXBwb3J0ZWQgcmVmZXJl
bmNlIG9mIHR5cGUgJyRyZWYnIHBhc3NlZCB0byBzdGVtLiI7CiAgICAgICAgIH0KICAgICAgfSBl
bHNlIHsKICAgICAgICAgcHVzaCBAd29yZCwgbGM7CQkJIyAxOiBNb3N0IGxpa2VseSBjYXNlOiBT
Y2FsYXIsIEFycmF5CiAgICAgIH0KICAgfSAjIE9yIHVzZTogIG1hcCB7IHB1c2ggQHdvcmQsIGxj
IH0gQF87CiAgIHJldHVybiB1bmRlZiBpZiBub3QgQHdvcmQ7CiAgICM9PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0gLS0gQWxsYW4gRmllbGRzID09
PT0jCiAgIAogICBmb3JlYWNoIChAd29yZCkgewogICAgICBteSAoJHN0ZW0sICRzdWZmaXgpOwog
ICAgICBuZXh0IGlmIGxlbmd0aCgkXykgPCAzOwkJCSMgbGVuZ3RoIGF0IGxlYXN0IDMgZm9yIHN0
ZW1taW5nCgogICAgICAjIFByZWx1ZGUgLSBNYXAgaW5pdGlhbCB5IHRvIFkgc28gdGhhdCB0aGUg
cGF0dGVybnMgbmV2ZXIgdHJlYXQgaXQgYXMgdm93ZWw6CiAgICAgIHMvXnkvWS87CgoKICAgICAg
IyBTdGVwIDFhOgogICAgICBpZiAgICAoLyhzc3xpKWVzJC8pCQkJeyAkXyA9ICRgIC4gJDEgfQog
ICAgICBlbHNpZiAoLyhbXnNdKXMkLykJCQl7ICRfID0gJGAgLiAkMSB9CgoKICAgICAgIyBTdGVw
IDFiOgogICAgICBpZiAoL2VlZCQvKSB7CiAgICAgICAgIGlmICgkYCA9fiAvJG1ncjAvbykgeyBj
aG9wIH0KICAgICAgfSBlbHNpZiAoLyg/OmVkfGluZykkLykgewogICAgICAgICAkc3RlbSA9ICRg
OwogICAgICAgICBpZiAoJHN0ZW0gPX4gLyRfdi9vKSB7CiAgICAgICAgICAgICRfID0gJHN0ZW07
CiAgICAgICAgICAgIGlmICAgICgvKD86YXR8Ymx8aXopJC8pCQl7ICRfIC49ICdlJyB9CiAgICAg
ICAgICAgIGVsc2lmICgvKFteYWVpb3V5bHN6XSlcMSQvKQkJeyBjaG9wICAgICAgfQogICAgICAg
ICAgICBlbHNpZiAoL14ke0N9JHt2fVteYWVpb3V3eHldJC9vKQl7ICRfIC49ICdlJyB9CiAgICAg
ICAgIH0KICAgICAgfQoKICAgICAgIyBTdGVwIDFjOgogICAgICBpZiAoL3kkLykgewogICAgICAg
ICRzdGVtID0gJGA7CiAgICAgICAgJF8gPSAiJHtzdGVtfWkiCQkJCWlmICgkc3RlbSA9fiAvJF92
L28pOwogICAgICB9CgoKICAgICAgIyBTdGVwIDI6CiAgICAgIGlmICgvJFNURVAyJC9vKSB7CiAg
ICAgICAgJHN0ZW0gPSAkYDsgJHN1ZmZpeCA9ICQxOwogICAgICAgICRfID0gJHN0ZW0gLiAkc3Rl
cDJsaXN0eyRzdWZmaXh9CWlmICgkc3RlbSA9fiAvJG1ncjAvbyk7CiAgICAgIH0KCgogICAgICAj
IFN0ZXAgMzoKICAgICAgaWYgKC8kU1RFUDMkL28pIHsKICAgICAgICAkc3RlbSA9ICRgOyAkc3Vm
Zml4ID0gJDE7CiAgICAgICAgJF8gPSAkc3RlbSAuICRzdGVwM2xpc3R7JHN1ZmZpeH0JaWYgKCRz
dGVtID1+IC8kbWdyMC9vKTsKICAgICAgfQoKCiAgICAgICMgU3RlcCA0OgogICAgICBpZiAoLyRT
VEVQNCQvbykgewogICAgICAgICRzdGVtID0gJGA7CiAgICAgICAgJF8gPSAkc3RlbQkJCQlpZiAo
JHN0ZW0gPX4gLyRtZ3IxL28pOwogICAgICB9IGVsc2lmICgvKHN8dClpb24kLykgewogICAgICAg
ICRzdGVtID0gJGAgLiAkMTsKICAgICAgICAkXyA9ICRzdGVtCQkJCWlmICgkc3RlbSA9fiAvJG1n
cjEvbyk7CiAgICAgIH0KICAgCgogICAgICAjIFN0ZXAgNToKICAgICAgaWYgKC9lJC8pIHsKICAg
ICAgICAkc3RlbSA9ICRgOwogICAgICAgICRfID0gJHN0ZW0gaWYgKAogICAgICAgICAgICRzdGVt
ID1+IC8kbWdyMS9vIG9yCiAgICAgICAgKCAgJHN0ZW0gPX4gLyRtZXExL28gYW5kIG5vdAogICAg
ICAgICAgICRzdGVtID1+IC9eICRDICR2IFteYWVpb3V3eHldICQveG8gICkKICAgICAgICApOwog
ICAgICB9CiAgICAgIGNob3AJCQkJCWlmICgvbGwkLyBhbmQgLyRtZ3IxL28pOwoKICAgICAgIyBQ
b3N0bHVkZSAtIFR1cm4gaW5pdGlhbCBZIGJhY2sgdG8geToKICAgICAgcy9eWS95LzsKCiAgIH0K
CiAgIHJldHVybiB3YW50YXJyYXk/IEB3b3JkIDogJHdvcmRbMF07Cgp9CgoKCiMgUmVhZCBpbiB3
b3JkcyBhbmQgc3RlbSB0byBzdGRvdXQ6CndoaWxlICg8PikgewogICB3aGlsZSAoL1xHKC4qPyko
WyRWQUxJRF0rKS9jb2cpIHsKICAgICAgcHJpbnQgIiQxIiwgKGRlZmluZWQgJDI/IHN0ZW0oJDIp
OicnKTsKICAgfQogICBwcmludCAiXG4iOwp9CgoKX19FTkRfXwoKCiMgVGVzdCBhcnJheSwgcmVm
ZXJlbmNlIGhhbmRsaW5nIGNvZGU6CnByaW50ICJSZXN1bHRzOlxuIjsKcHJpbnQgIiAtICRfXG4i
IGZvciBzdGVtKCggJ2hpJywgWyAnZGViYXRlJywgJ2RlYnVua2VkJywgJ3RoZXJlJywgJ3dhbGtp
bmcnIF0sICAndGFsa2luZycsIFwnYmFraW5nJywgKCdoYXJkbHknLCAnbmVnb3RpYXRlJywgJ2Fu
dGljaXBhdGlvbicpLCAncmF0aW9uYWxpemUnICkgKTsKIAoKIyBTb21lIG90aGVyIGlucHV0IG1l
dGhvZHMgdG8gdHJ5OgoKcHJpbnQgIkVudGVyIHdvcmRzOiBcblxuIjsKd2hpbGUgKDw+KSB7Cgog
ICBpZiAobXkgQHdvcmQgPSAvWyRWQUxJRF0rL2NvZykgewoKICAgICAgIyBPbmUgcGVyIGxpbmUK
ICAgICAgI2ZvcmVhY2ggKEB3b3JkKSB7CiAgICAgICMgICBwcmludCAiICAtICRfID0+ICI7CiAg
ICAgICMgICBteSAkcmVzdWx0ID0gc3RlbSgkXyk7CiAgICAgICMgICBwcmludCAnJywoZGVmaW5l
ZCAkcmVzdWx0PyAiJHJlc3VsdCI6Jyh1bmRlZiknKSwiXG4iOwogICAgICAjfQoKICAgICAgIyBN
YW55IHBlciBsaW5lIHcvIGJlbmNobWFyawogICAgICBteSBAcmVzdWx0OwogICAgICBwcmludCAi
U3RlbW1pbmc6ICIsKGpvaW4gJywgJywgQHdvcmQpLCIuLi5cbiI7CiAgICAgIG15ICRjb3VudCA9
IDEwMDAwOwogICAgICBteSAkdCA9IHRpbWVpdCgkY291bnQsIHN1YiB7IEByZXN1bHQgPSBzdGVt
KEB3b3JkKTsgfSk7CiAgICAgIHByaW50ICJSZXN1bHQ6ICIsKGpvaW4gJywgJywgQHJlc3VsdCks
Ii5cbiI7CiAgICAgIHdhcm4gIiRjb3VudCBsb29wcyBvZiBzdGVtbWluZyBjb2RlIHRvb2s6Iix0
aW1lc3RyKCR0KSwiXG4iOwoKCiAgICAgIHByaW50ICJcbiI7CiAgIH0KfQoKCgo9aGVhZDEgSElT
VE9SWQoKcGVybC50eHQ6Cgo9b3ZlciA0Cgo9aXRlbSAtCgpSZWxlYXNlIDEgb2YgUG9ydGVyIFN0
ZW1tZXI6IGJ5IE1hcnRpbiBQb3J0ZXIuCgo9aXRlbSAtCgpDb2RlIGNoYW5nZXMsIGxpdHRlL25v
IGFsZ29yaXRobSBjaGFuZ2VzOiBBbGxhbiBGaWVsZHMsIDIwMDIuICBXaWxsIHN1Ym1pdCBhbnkg
YWxnb3JpdGhtIHN1Z2dlc3Rpb25zIGFzIHBhcnQgb2Ygc25vd2JhbGwgcHJvamVjdCBpbnN0ZWFk
LgoKPWJhY2sKCgo9aGVhZDEgU0VFIEFMU08KCj1vdmVyIDQKCj1pdGVtICoKClBvcnRlciwgTS5G
LiwgIkFuIGFsZ29yaXRobSBmb3Igc3VmZml4IHN0cmlwcGluZyIsIFByb2dyYW0sIFZvbC4gMTQg
KDMpLCBKdWx5IDE5ODAsIHBwIDEzMC0xMzcuCgo9aXRlbSAqCgpodHRwOi8vd3d3LnRhcnRhcnVz
Lm9yZy9+bWFydGluL1BvcnRlclN0ZW1tZXIgLSBNYXJ0aW4gUG9ydGVyJ3Mgb2ZmaWNpYWwgUG9y
dGVyIFN0ZW1tZXIgd2Vic2l0ZQoKPWl0ZW0gKgoKaHR0cDovL3Nub3diYWxsLnNvdXJjZWZvcmdl
Lm5ldC8gLSBTbm93YmFsbCBwcm9qZWN0LCBuZXdlc3Qgc3RlbW1lcnMuICBNdWx0aWxpbmd1YWwg
c3RlbW1lcnMgd3JpdHRlbiBpbiBzbm93YmFsbC4KCj1iYWNrCgoKPWN1dAoKCg==

--------------Boundary-00=_S99T4OWJ901G6SPL3SMO--

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss