[Xapian-discuss] Prefixes

Fabrice Colin fabrice.colin at gmail.com
Tue Jan 30 04:36:39 GMT 2007


On 1/30/07, Olly Betts <olly at survex.com> wrote:
> On 26/01/07, Fabrice Colin <fabrice.colin at gmail.com> wrote:
> > I am using QueryParser::add_boolean_prefix("url", "U") to restrict searches to
> > documents that have a specific URL.
> > When the input has a URL containing a space, how should it be quoted ?
>
> There isn't currently a way to quote such a prefixed boolean term, but
> shouldn't spaces be quoted as %20 in a url anyway?
>
Yes, for a URL, quoting makes sense, but for a file name filter, not so much.
For instance, entering something like 'file:"My CV.txt"' is not completely
unreasonable.

Actually, this would be useful for searching indexes built by omindex.
As far as I can tell it doesn't escape U-prefixed terms, so if a user wanted to
find the document that has the term 'Uhttp://localhost/some file.txt', he would
have to enter 'url:http://localhost/some%20file.txt', and the app would have to
unescape the U-prefixed term in the Query object generated by the
QueryParser.

> > This leads me to a second question. At indexing time, long URLs are hashed just
> > like what omindex does with hash_long_term(). Because of this, the QueryParser
> > will always generate the wrong term when its input has a filter on one of these
> > long URLs. Would it be possible to have something like the following ?
> >
> > void Xapian::QueryParser::add_boolean_prefix(
> > const std::string &field,
> > const std::string &prefix,
> > const TermTransformer *transform);
>
> Perhaps, though for this case it seems unlikely that a user would
> really type in a 240+ character URL...
>
A sane user would copy the URL from somewhere else, perhaps a
notes taking program or a browser window, and paste it into the input field :-)

Generally speaking, this would remove the need to pre-process user input or
post-process the Query for fields that need some kind of transformation.

Fabrice



More information about the Xapian-discuss mailing list