[Xapian-discuss] Another query parser bug
Ron Kass
ron at pidgintech.com
Tue Oct 23 15:35:04 BST 2007
The following test script was written to test what I found as a possible
bug in query parser
#!/usr/bin/perl
use strict;
use Search::Xapian qw/:standard/;
my $QueryParser = new Search::Xapian::QueryParser();
$QueryParser->set_default_op(OP_AND);
$QueryParser->set_stemmer(new Search::Xapian::Stem("english"));
$QueryParser->set_stemming_strategy(STEM_SOME);
$QueryParser->add_boolean_prefix("Title","T");
print "this script is to test the LoveHate feature in conjunction
with a single boolean prefixes.\nNotice that when using boolean
prefixes, the -notallowed translates to a regular AND search rather
than a AND_NOT as it should be.\nAlso note, brackets, or order of
the terms does not make a difference.\n\nHowever,
it seems that if at least one of the terms is not a boolean prefix,
the parser parses the query correctly, regardless of order. Not 100%
verified this bit, but seems so.\n\n";
print "right: ".$QueryParser->parse_query(qq{word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{(Title:word)
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{Title:word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "wrong: ".$QueryParser->parse_query(qq{-notallowed
Title:word},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "right: ".$QueryParser->parse_query(qq{term Title:word
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
FLAG_WILDCARD))."\n";
print "right: ".$QueryParser->parse_query(qq{Title:first term
Title:word -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE
| FLAG_WILDCARD))."\n";
This is the output:
this script is to test the LoveHate feature in conjunction with a
single boolean prefixes.
Notice that when using boolean prefixes, the -notallowed translates
to a regular AND search rather than a AND_NOT as it should be.
Also note, brackets, or order of the terms does not make a difference.
However, it seems that if at least one of the terms is not a boolean
prefix, the parser parses the query correctly, regardless of order.
Not 100% verified this bit, but seems so.
right: Xapian::Query((Zword:(pos=1) AND_NOT Znotallow:(pos=2)))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
FILTER Tword))
right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
FILTER (Tfirst OR Tword)))
Notice that the third search has [Znotallow:(pos=1)] rather than
[AND_NOT Znotallow:(pos=1)] or placing it in the FILTER section
It seems that when placing at least one non prefixed term, the parser
manages to parse the phrase, regardless of where that word is.
Your thoughts?
And one last question regarding the parser in this case..
Should/Could there be any performance difference between the following
three parsed queries? (FILTER vs AND_NOT and AND_NOT*2 vs AND_NOT/OR)
1. Xapian::Query(((Zterm:(pos=1) Znotallow:(pos=2)) FILTER (Tfirst OR
Tword)))
2. Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2) AND_NOT
Tfirst:(pos=3)) FILTER Tword))
3. Xapian::Query(((Zterm:(pos=1) AND_NOT (Znotallow:(pos=2) OR
Tfirst:(pos=3))) FILTER Tword))
Ron
More information about the Xapian-discuss
mailing list