[Xapian-discuss] Japanese stemming
Seo Sanghyeon
tinuviel at sparcs.kaist.ac.kr
Sun Apr 17 14:07:41 BST 2005
Hello, new to the list,
I am interested in Xapian. While reading the site, I found that
http://xapian.org/docs/stemming.html states:
"A stemming algorithm is a process of linguistic normalisation, in which
the variant forms of a word are reduced to a common form... For many of
the world's languages, Chinese and Japanese for example, this concept is
irrelevant,"
Which I found very strange. Of course, stemming is very valuable in
Japanese language. I think it is even better example than English
example of connection/connective/connected/connecting. For example:
éɪë odoru dance
éɪéªÊª¤ odoranai doesn't dance
éɪê¿ odotta danced
éɪéªÊª«ªÃª¿ odoranakatta didn't dance
éɪìªë odoreru can dance
éɪìªÊª¤ odorenai can't dance
éɪ쪿 odoreta could dance
éɪìªÊª«ªÃª¿ odorenakatta couldn't dance
éɪêƪ¤ªë odotteiru is dancing
éɪêƪ¤ªÊª¤ odotteinai isn't dancing
And so on. (Okay, this is rather obvious because only stem is written
in Kanji. You can replace éÉ with ªªªÉ then.)
Yes, as you can see, I started to learn Japanese recently. :-) I am
not sure I may try to write Japanese stemmer myself... Can anyone
help?
I visited the Snowball site and read the manual there. It was an
interesting read.
Seo Sanghyeon
More information about the Xapian-discuss
mailing list