[Snowball-discuss] Chinese and Japanese stemmeing algorithm
Miguel Florido
miguel.florido at softonic.com
Thu Nov 3 10:31:38 GMT 2011
Thanks a lot Olly, I'm going to check it.
Miguel Florido
Web Developer Junior
miguel.florido at softonic.com
http://www.softonic.com
Edificio Meridian C/ Rosselló i Porcel, 21, planta 17 - 08016 Barcelona (SPAIN)
Tel+34 936 012 700 Fax+34 933 969 292
Award winning company Great Place to Work 2011
This e-mail (and any attached files) may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
-----Mensaje original-----
De: Olly Betts [mailto:olly at survex.com]
Enviado el: jueves, 03 de noviembre de 2011 11:31
Para: Richard Boulton
CC: Martin Porter; Snowball-discuss at lists.tartarus.org; Miguel Florido
Asunto: Re: [Snowball-discuss] Chinese and Japanese stemmeing algorithm
On Wed, Nov 02, 2011 at 11:03:18AM +0000, Richard Boulton wrote:
> There are also more sophisticated approaches, generally involving some
> use of dictionaries. I don't know of standalone code for doing these,
> but we had a Google-Summer-of-Code student with Xapian this year who
> implemented quite a lot of stuff for Chinese word segmentation; his
> work hasn't been integrated into Xapian core yet, but the trac page
> describing it (with links to the code) is
> http://trac.xapian.org/wiki/GSoC2011/ChineseSegmentationAnalysis
That work is currently pretty much standalone (I think it uses Xapian's
Utf-8 support, but that wouldn't be hard to replace if you wanted to
use it in another context).
There's also scws which is standalone:
http://www.ftphp.com/scws/
I don't know a whole lot about it - I only know of it because there's a
patch for Xapian integration from its author:
http://article.gmane.org/gmane.comp.search.xapian.general/9052
> Olly Betts was his primary mentor, so may be able to give more detail.
I can certainly try to answer questions.
Cheers,
Olly
More information about the Snowball-discuss
mailing list