[Xapian-discuss] Japanese stemming

Seo Sanghyeon tinuviel at sparcs.kaist.ac.kr
Sun Apr 17 14:07:41 BST 2005


Hello, new to the list,

I am interested in Xapian. While reading the site, I found that
http://xapian.org/docs/stemming.html states:

"A stemming algorithm is a process of linguistic normalisation, in which
the variant forms of a word are reduced to a common form... For many of
the world's languages, Chinese and Japanese for example, this concept is
irrelevant,"

Which I found very strange. Of course, stemming is very valuable in
Japanese language. I think it is even better example than English
example of connection/connective/connected/connecting. For example:

éÉªë         odoru        dance
éɪéªÊª¤     odoranai     doesn't dance
éɪê¿       odotta       danced
éɪéªÊª«ªÃª¿ odoranakatta didn't dance
éɪìªë       odoreru      can dance
éɪìªÊª¤     odorenai     can't dance
éɪ쪿       odoreta      could dance
éɪìªÊª«ªÃª¿ odorenakatta couldn't dance
éɪêƪ¤ªë   odotteiru    is dancing
éɪêƪ¤ªÊª¤ odotteinai   isn't dancing

And so on. (Okay, this is rather obvious because only stem is written
in Kanji. You can replace éÉ with ªªªÉ then.)

Yes, as you can see, I started to learn Japanese recently. :-) I am
not sure I may try to write Japanese stemmer myself... Can anyone
help?

I visited the Snowball site and read the manual there. It was an
interesting read.

Seo Sanghyeon



More information about the Xapian-discuss mailing list