[Xapian-discuss] Order of NOT operand?

David Sauve dnsauve at gmail.com
Tue Sep 1 23:08:47 BST 2009


Scratch that Query output.  I was mixing query string values from a defect I
have open on Github with the actual test database I'm using for debugging.

When I construct a query using real field names and NOT operators I get the
expected query from the query parser, I think.  Here's an example:

self.sb.search('indexed NOT name:david1 NOT name:david2')
# Xapian::Query(((Zindex:(pos=1) AND_NOT ZXNAMEdavid1:(pos=2)) AND_NOT
(XNAMEdavid2:(pos=3) OR ZXNAMEdavid2:(pos=3))))
self.sb.search('NOT name:david1 NOT name:david2 indexed')
# Xapian::Query(((<alldocuments> AND_NOT ZXNAMEdavid1:(pos=1)) AND_NOT
(ZXNAMEdavid2:(pos=2) OR indexed:(pos=3) OR Zindex:(pos=3))))

That said, I'm going to take Richard's advice, and rather than use
QueryParser to parse a generated query string, I think I'll build the query
myself.

On Tue, Sep 1, 2009 at 9:31 AM, Richard Boulton <richard at tartarus.org>wrote:

> 2009/9/1 David Sauve <dnsauve at gmail.com>
>
>>  To be more specific, the query string is a combination of user input
>> (what
>> the typed into the search box), and filters such as field equals, exclude,
>> etc.  These are all done by Django-Haystack itself in order to make the
>> backend (in this case Xapian) pluggable.
>>
>
> It's probably about time I checked out a copy of the django-haystack xapian
> backend.  This sounds very unpleasant.  As Olly said, it's almost certainly
> a mistake to be constructing something to pass to the query parser, rather
> than passing it user input directly.  To construct queries without running
> into unexpected problems with quoting, operator precedence, etc, is almost
> impossible, and is always going to be fragile with respect to changes in the
> query parser.  This is because the query parser is not parsing a formal
> grammar - it is trying to guess the user's intention to some extend, and is
> thus likely to get confused when presented with input which isn't actually
> user input, but is machine generated.
>
> Is this an unavoidable result of the way the rest of Django-Haystack
> works?  Is there no way that haystack can be persuaded to give the backend
> the raw input?  If not, it sounds like a bug in Django-Haystack's design, to
> me...
>
> Looking at
> http://github.com/notanumber/xapian-haystack/blob/d593924386cc050e3e97ce129ff71dad50e1139e/xapian_backend.py#L268however, it looks like the search() method is presented with the user's
> query string separately from the list of fields to filter on.  Maybe I'm
> misinterpreting.  Also, could the "build_query" function at line 879 in that
> file not return a structured representation of the query, rather than a
> single string?  (If there's some reason imposed by haystack that forces it
> to be a string, you could always serialise it to a pickle or a JSON value
> before passing it through.)
>
> In practice, it is made up of two parts, a SearchBackend (the Xapian
>> interface), and a SearchQuery (the bit the "cleans" and assembles the
>> query
>> string into a format that Xapian can recognise).
>>
>> What I get, after Django-Haystack is done, in the SeachQuery, is a series
>> of
>> filters for fields.  From this, I need to "re-assemble" a query string to
>> be
>> passed to the SearchBackend instance at a later time.
>>
>
> > > I wouldn't think that would matter, but
>> > > the following two queries are generating different search results:
>> > >
>> > > java AND NOT id:1 NOT id:2
>> > > vs.
>> > > NOT id:1 NOT id:2 AND java
>> >
>> > What sort of prefix is "id"?
>> >
>> > In this case, "id" is a field prefix.
>>
>
> Looking at the results of your parse below, It looks like this field prefix
> isn't being set on the query parser (with either add_prefix() or
> add_boolean_prefix())  As a result, the ":" is being considered as a
> phrase-generating word separator, and Xapian is trying to look for
> occurrences of "id" followed by "1" or "2". It also looks like you might not
> be supplying the right flags to the query parser to allow it to recognise
> the "AND" in the second one.
>
> Xapian::Query(((Zjava:(pos=1) AND_NOT (id:(pos=2) PHRASE 2 1:(pos=3)))
>> AND_NOT (id:(pos=4) PHRASE 2 2:(pos=5))))
>>
>> Xapian::Query(((<alldocuments> AND_NOT (id:(pos=1) PHRASE 2 1:(pos=2)))
>> AND_NOT ((id:(pos=3) PHRASE 2 2:(pos=4)) OR Zjava:(pos=5))))
>>
>
> --
> Richard
>


More information about the Xapian-discuss mailing list