[Xapian-discuss] Another query parser bug

Ron Kass ron at pidgintech.com
Tue Oct 23 15:35:04 BST 2007


The following test script was written to test what I found as a possible 
bug in query parser

    #!/usr/bin/perl
    use strict;
    use Search::Xapian qw/:standard/;
    my $QueryParser = new Search::Xapian::QueryParser();
    $QueryParser->set_default_op(OP_AND);
    $QueryParser->set_stemmer(new Search::Xapian::Stem("english"));
    $QueryParser->set_stemming_strategy(STEM_SOME);
    $QueryParser->add_boolean_prefix("Title","T");

    print "this script is to test the LoveHate feature in conjunction
    with a single boolean prefixes.\nNotice that when using boolean
    prefixes, the -notallowed translates to a regular AND search rather
    than a AND_NOT as it should be.\nAlso note, brackets, or order of
    the terms does not make a difference.\n\nHowever,
    it seems that if at least one of the terms is not a boolean prefix,
    the parser parses the query correctly, regardless of order. Not 100%
    verified this bit, but seems so.\n\n";

    print "right: ".$QueryParser->parse_query(qq{word
    -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
    FLAG_WILDCARD))."\n";
    print "wrong: ".$QueryParser->parse_query(qq{(Title:word)
    -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
    FLAG_WILDCARD))."\n";
    print "wrong: ".$QueryParser->parse_query(qq{Title:word
    -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
    FLAG_WILDCARD))."\n";
    print "wrong: ".$QueryParser->parse_query(qq{-notallowed
    Title:word},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
    FLAG_WILDCARD))."\n";
    print "right: ".$QueryParser->parse_query(qq{term Title:word
    -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
    FLAG_WILDCARD))."\n";
    print "right: ".$QueryParser->parse_query(qq{Title:first term
    Title:word -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE
    | FLAG_WILDCARD))."\n";


This is the output:

    this script is to test the LoveHate feature in conjunction with a
    single boolean prefixes.
    Notice that when using boolean prefixes, the -notallowed translates
    to a regular AND search rather than a AND_NOT as it should be.
    Also note, brackets, or order of the terms does not make a difference.

    However, it seems that if at least one of the terms is not a boolean
    prefix, the parser parses the query correctly, regardless of order.
    Not 100% verified this bit, but seems so.

    right: Xapian::Query((Zword:(pos=1) AND_NOT Znotallow:(pos=2)))
    wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
    wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
    wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))
    right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
    FILTER Tword))
    right: Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2))
    FILTER (Tfirst OR Tword)))            

Notice that the third search has [Znotallow:(pos=1)] rather than 
[AND_NOT Znotallow:(pos=1)] or placing it in the FILTER section
It seems that when placing at least one non prefixed term, the parser 
manages to parse the phrase, regardless of where that word is.

Your thoughts?

And one last question regarding the parser in this case..
Should/Could there be any performance difference between the following 
three parsed queries? (FILTER vs AND_NOT and AND_NOT*2 vs AND_NOT/OR)
1. Xapian::Query(((Zterm:(pos=1) Znotallow:(pos=2)) FILTER (Tfirst OR 
Tword)))            
2. Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2) AND_NOT 
Tfirst:(pos=3)) FILTER Tword))            
3. Xapian::Query(((Zterm:(pos=1) AND_NOT (Znotallow:(pos=2) OR 
Tfirst:(pos=3))) FILTER Tword))            

Ron


More information about the Xapian-discuss mailing list