[Xapian-discuss] UTF8 support plans (without stemming)

Alexandre Xlex0x835 at rambler.ru
Thu Apr 28 08:08:28 BST 2005


On Apr 28, 2005, at 00:17, rm at fabula.de wrote:

>> If so, "query parser ... currently assume latin1" - that's not very
>> good, isn't it?
>
> Hmm. Depends on what you want/need to do. I personally can't see why 
> there
> even _is_ a query parser in Xapian core. After all the query language 
> really
> depends on the aplication ...

To be honest I didn't dig inside library, I just believe in bug 
report... =)
Anyway, usually, when application/library was developed to support only 
one language (american/english) it's very hard to make it work with 
other languages (for example, with russian) - there are lots of 
problems inside...

>> Hm, and can you tell me, please, more about stemming influence on IR 
>> in
>> western languages? Is it only about probabilistic IR or about vector
>> search too?
>>
>> And another one question (not exactly about subject): why Xapian stick
>> to the probabilistic approach? Probably some historical links/docs?
>
> Well, these two querstions relate to each other: Xapian is strong in
> 'probabilistic IR' and that approach kind of needs some sort of 
> stemming.
> I can't speak for the Xapian developers (nor the libraries ancestry
> in the guts of Muscat) - from your question i infer that you seem to 
> think
> that 'probabilistic IR' is kind of outdated?

I'm not a an expert, to have any moral rights to say, that I strongly 
believe, that 'probabilistic IR' is kind of outdated.
I just suppose, that computer can work well with lots of data, while 
human brain can make some sort of decisions. No, I'm not for boolean 
search, but I just didn't like probabilistic approach too much (when 
machine tries to be smart)... I can (and probably is) absolutely wrong, 
that's why I interested why people choose such approach.

Regards,
/Alexandre.





More information about the Xapian-discuss mailing list