[Snowball-discuss] Chinese and Japanese stemmeing algorithm

Miguel Florido miguel.florido at softonic.com
Thu Nov 3 10:31:38 GMT 2011


Thanks a lot Olly, I'm going to check it.

Miguel Florido
Web Developer Junior
miguel.florido at softonic.com


http://www.softonic.com
Edificio Meridian C/ Rosselló i Porcel, 21, planta 17 - 08016 Barcelona (SPAIN)
Tel+34 936 012 700     Fax+34 933 969 292

Award winning company Great Place to Work 2011


This e-mail (and any attached files) may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
-----Mensaje original-----
De: Olly Betts [mailto:olly at survex.com] 
Enviado el: jueves, 03 de noviembre de 2011 11:31
Para: Richard Boulton
CC: Martin Porter; Snowball-discuss at lists.tartarus.org; Miguel Florido
Asunto: Re: [Snowball-discuss] Chinese and Japanese stemmeing algorithm

On Wed, Nov 02, 2011 at 11:03:18AM +0000, Richard Boulton wrote:
> There are also more sophisticated approaches, generally involving some
> use of dictionaries.  I don't know of standalone code for doing these,
> but we had a Google-Summer-of-Code student with Xapian this year who
> implemented quite a lot of stuff for Chinese word segmentation; his
> work hasn't been integrated into Xapian core yet, but the trac page
> describing it (with links to the code) is
> http://trac.xapian.org/wiki/GSoC2011/ChineseSegmentationAnalysis

That work is currently pretty much standalone (I think it uses Xapian's
Utf-8 support, but that wouldn't be hard to replace if you wanted to
use it in another context).

There's also scws which is standalone:

http://www.ftphp.com/scws/

I don't know a whole lot about it - I only know of it because there's a
patch for Xapian integration from its author:

http://article.gmane.org/gmane.comp.search.xapian.general/9052

> Olly Betts was his primary mentor, so may be able to give more detail.

I can certainly try to answer questions.

Cheers,
    Olly



More information about the Snowball-discuss mailing list