[Xapian-discuss] Queryparser problem..
Jesper Krogh
jesper at krogh.cc
Sun Dec 9 18:06:36 GMT 2007
Olly Betts skrev:
> On Sun, Dec 09, 2007 at 08:16:17AM +0100, Jesper Krogh wrote:
>> The queryparser in my setup is using strategy STEM_SOME which seem to
>> give the best handling of the data in our setup.
>>
>> But the queryparser doesn't really seem to be consistent.
>> doc:test
>> Running query 'Xapian::Query(ZDOCTYPEtest:(pos=1))'
>>
>> Here it applies stemming to the term before running the query (Z-prefix)
>>
>> doc:1234
>> Running query 'Xapian::Query(DOCTYPE1234:(pos=1))'
>>
>> There it skips the stemming.
>>
>> What is the reason for behaving different based on user-input?
>
> http://www.xapian.org/docs/termgenerator.html
>
> Now we index all terms lowercased with positional information, and
> also stemmed with a 'Z' prefix (unless they start with a digit) [...]
>
> Indexing terms which start with a digit twice just bloats the database.
> I'm not aware of a language where words can start with a digit, and it
> can actually harm retrieval if we attempt to stem part numbers and other
> codes.
Ok, Thanks.
I'm probably just (mis-)using Xapian anyway. The problem is that every
document should be traceable after retrieval. Thus I add:
doctype:<type> and id:<id>
The "viewer" application, then knows what to do and I can search the
document up and replace it by letting the indexer query for "doctype:<>
id:<>" before doing add_ or replace_.
This worked flawless until my "doctype" actually was stemmable..
How does people generally solve this task? (adding a 0 in fron of my
doctype would solve the problem.. but elegant?).
Jesper .. we're using a "homebrewet" termgenerator and tries to play
nice with how the queryparser expects the dataset to be.
--
Jesper Krogh
More information about the Xapian-discuss
mailing list