[Xapian-discuss] UTF8 support plans (without stemming)
rm at fabula.de
rm at fabula.de
Wed Apr 27 21:17:20 BST 2005
On Thu, Apr 28, 2005 at 12:09:26AM +0400, Alexandre wrote:
> On Apr 27, 2005, at 23:47, rm at fabula.de wrote:
> >On Wed, Apr 27, 2005 at 11:32:30PM +0400, Alexandre wrote:
> >>Good day,
> >>does there is any plans about support of the UTF-8 (I talk about lib
> >>core, not about stemming)?
> >What exactly do you mean by UTF-8 support? You can pretty much stuff
> >anything into a xapian database (see some recent posts in this list).
> >But -- without stemming statistical information retieval doesn't really
> >work as expected in most western languages :-/
> Ralf, do you mean this post
Yes, that's the last one. The bug report mentioned in this post gives
> If so, "query parser ... currently assume latin1" - that's not very
> good, isn't it?
Hmm. Depends on what you want/need to do. I personally can't see why there
even _is_ a query parser in Xapian core. After all the query language really
depends on the aplication ...
> Hm, and can you tell me, please, more about stemming influence on IR in
> western languages? Is it only about probabilistic IR or about vector
> search too?
> And another one question (not exactly about subject): why Xapian stick
> to the probabilistic approach? Probably some historical links/docs?
Well, these two querstions relate to each other: Xapian is strong in
'probabilistic IR' and that approach kind of needs some sort of stemming.
I can't speak for the Xapian developers (nor the libraries ancestry
in the guts of Muscat) - from your question i infer that you seem to think
that 'probabilistic IR' is kind of outdated?
> Thank you in advance,
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
More information about the Xapian-discuss