[Snowball-discuss] Japanese stemmer?

Micah Bly micah.j.bly at medtronic.com
Fri Jan 26 16:15:16 GMT 2007


Martin,

I did that search a few months back, and doing it again today, I  
think I made a mistake. I found a page like this:
http://lists.tartarus.org/pipermail/xapian-discuss/2005-April/ 
000832.html

Which is on tartarus, but isn't necessarily snowball-related, at  
least directly.

As far as Japanese stemming goes, I can contribute linguistic  
knowledge, and pseudo code, but I don't have any experience writing  
stemmers, and I don't 'speak' snowball. Would anyone else out there  
be interested in collaborating on a stemmer for Japanese?

In other words, I could probably brute force one, but it would not be  
rational or efficient.

Micah Bly

On Jan 26, 2007, at 4:04 AM, Martin Porter wrote:

>
> Micah,
>
> I don't know of particular work in this area, but am broadly aware of
> the problems, which are (a) segmentation of text into words and (b)  
> word
> normalisation, of which something like stemming forms a part. The  
> place
> to go for solutions is no doubt Japan itself. There are commercial
> solutions in the West though, with proprietary software from companies
> like Inxight and Teragram. Among all the major languages, Japanese
> presents the worst problems.
>
> I don't believe the Snowball site says anywhere that stemming doesn't
> matter for Japanese. Can you point to where you found this?
>
> Martin
>
>> Does anyone know of any work being done on a Japanese stemmer? I
>> searched around this site, found a reference that said stemming
>> didn't matter for Japanese (err, ah...), but that was about it.
>>
>> I'm not even sure where to go to look for rules on stemming Japanese.
>>
>> Micah Bly
>
>




More information about the Snowball-discuss mailing list