[Xapian-discuss] check for blacklisted words (and thanks)

James Aylett james-xapian at tartarus.org
Wed May 21 10:17:37 BST 2008


On Wed, May 21, 2008 at 09:23:28AM +0200, Alessandro Pasotti wrote:

> Now the question: I must check if a particular document contains
> blacklisted words (which are in a textfile, unstemmed one per line),
> is there a way to restrict a query to a single document and return a
> boolean value if one of the terms in the query are contained in the
> checked document?

If you want the blacklist to work unstemmed, and are using the
QueryParser, you can construct a new Query using
QueryParser::unstem_begin() and QueryParser::unstem_end(), OP_OR them
all together, and then OP_FILTER with a special (probably prefixed)
term that's only in the blacklist document. You'll get back nothing,
or the blacklist document.

If you want to employ stemming, instead use Query::get_terms_begin()
to get out the stemmed terms.

There are going to be other ways, possibly more efficient, than doing
this (for instance, if you're not using a stopper, you could write a
custom one and check if it's fired on any of your words; however I
suspect the above will scale to lots of blacklisted words better, if
that's an issue for you).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list