[Xapian-discuss] different stemming

James Aylett james-xapian at tartarus.org
Fri May 8 21:46:40 BST 2009


On Fri, May 08, 2009 at 11:43:16AM +0200, james cauwelier wrote:

> The site I am working on has products in different languages (dutch,
> english, french, italian, spanish).  I want to search these products, but
> while indexing I should use the correct stemmer.  No problem, because I know
> the language of a product description.
> 
> But when somebody queries the database I have no information about the
> language.  Thus, I am not able to select the correct stemmer for queries.
> How should I solve this?  Skip stemming altogether?  That 's what I am doing
> now.

I know this isn't the most helpful answer, but "it depends". You could
disable stemming, but this may have unhelpful effects on the quality
of your results. That's almost certainly the simplest thing to do,
though.

If you can figure out what language they care about most, you can stem
to that language and restrict the search to documents (products) that
were in that language in the first place. You may be able to ponder
this from the same source as you're choosing site localisation.

J

-- 
  James Aylett

  talktorex.co.uk - xapian.org - uncertaintydivision.org



More information about the Xapian-discuss mailing list