[Snowball-discuss] Japanese stemmer?
Micah Bly
micah.j.bly at medtronic.com
Fri Jan 26 16:15:16 GMT 2007
Martin,
I did that search a few months back, and doing it again today, I
think I made a mistake. I found a page like this:
http://lists.tartarus.org/pipermail/xapian-discuss/2005-April/
000832.html
Which is on tartarus, but isn't necessarily snowball-related, at
least directly.
As far as Japanese stemming goes, I can contribute linguistic
knowledge, and pseudo code, but I don't have any experience writing
stemmers, and I don't 'speak' snowball. Would anyone else out there
be interested in collaborating on a stemmer for Japanese?
In other words, I could probably brute force one, but it would not be
rational or efficient.
Micah Bly
On Jan 26, 2007, at 4:04 AM, Martin Porter wrote:
>
> Micah,
>
> I don't know of particular work in this area, but am broadly aware of
> the problems, which are (a) segmentation of text into words and (b)
> word
> normalisation, of which something like stemming forms a part. The
> place
> to go for solutions is no doubt Japan itself. There are commercial
> solutions in the West though, with proprietary software from companies
> like Inxight and Teragram. Among all the major languages, Japanese
> presents the worst problems.
>
> I don't believe the Snowball site says anywhere that stemming doesn't
> matter for Japanese. Can you point to where you found this?
>
> Martin
>
>> Does anyone know of any work being done on a Japanese stemmer? I
>> searched around this site, found a reference that said stemming
>> didn't matter for Japanese (err, ah...), but that was about it.
>>
>> I'm not even sure where to go to look for rules on stemming Japanese.
>>
>> Micah Bly
>
>
More information about the Snowball-discuss
mailing list