[Snowball-discuss] Update on regex approach

Oleg Bartunov oleg@sai.msu.su
Wed, 8 May 2002 20:33:38 +0300 (GMT)


Allan,

I dont' understand what's the problem to use our Perl interface to
Snowball  You'll never get performance better than original C program.


	Oleg
On Wed, 8 May 2002, Allan Fields wrote:

> Hi,
>
> Sorry I haven't dropped by for a while, but I'm quite busy.  I'll try to get
> my updated Perl stemmer out with-in the next month.  More benchmarking to
> come.  =)  Biggest issue is with overhead of multiple words -- perl can be a
> real beastie performance wise I've witnessed.
>
> My other attempt to speed up the Perl stemmer that I've also been working on
> is stuck on a few technical details of the measure of words.  One idea I've
> had is to separate finding the measure from the main transform stage by using
> a reduced set representation in deriving the measure while using a single
> regular expression in substitution with supporting inline logic.  s///e  The
> biggest issue with this approach, is that at different points it in necessary
> to look-behind to see if the new measure has changed or is past a minimal
> boundry.  If there was a way to use integers to represent the logic of the
> {c, v, C, V} sequences, it might significantly speed up that stage by making
> the operations integer operations instead.  I would consider this more
> optimal in that, by forcing larger memory usage (still paltry on todays
> computers), it would be possible to conserve processor time.
>
> Also, by inlining all the logic to a single substitution, it could be said
> that perl's larger overhead is reduced somewhat.  Now I'm not sure it would
> compare to the C version, but I'm postulating it will be significantly faster
> than most other approaches in Perl.  (Although it won't be as algorithmic
> moving lots of the procedural elements to the regex itself.)
>
> This has lead me to believe that it may be possible to create a snowball
> compiler that creates stemmers using Perl regexes at most and at the least
> using sed for instance.  There are lots of options for snowball compilation
> currently, but it would have a special geek appeal to make this in sed.  Some
> one, please do beat me to it! ;)
>
> Allan
>
>
> _______________________________________________________________
>
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
> _______________________________________________
> Snowball-discuss mailing list
> Snowball-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/snowball-discuss
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss