[Xapian-discuss] Multilingual issues with Xapian
James Aylett
james-xapian at tartarus.org
Thu Oct 11 10:36:39 BST 2007
On Thu, Oct 11, 2007 at 02:09:10AM +0200, Ron Kass wrote:
> What if instead of stemming all the words in a document, even if they have
> no real stemmed form, the stemmer (during indexing) was to stem only words
> that it knows having a stemmed form?
Wouldn't you need a dictionary of stemmed forms for that? At which
point you might as well use a dictionary approach to stemming, which
can (with lots of work) give you better stemming anyway. The problem
is that with algorithmic stemming, *everything* has a stemmed form,
even if it isn't useful.
This is quite a common problem, but I don't actually know what the
common solution is :)
(Unless you can mark up your languages properly. Then you just have to
worry about how you stem your query.)
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list