[Xapian-discuss] check for blacklisted words (and thanks)

Olly Betts olly at survex.com
Wed May 21 14:46:46 BST 2008


On Wed, May 21, 2008 at 09:23:28AM +0200, Alessandro Pasotti wrote:
> Now the question: I must check if a particular document contains
> blacklisted words (which are in a textfile, unstemmed one per line),
> is there a way to restrict a query to a single document and return a
> boolean value if one of the terms in the query are contained in the
> checked document?

Rather than running a query in this case, I'd suggest you just take the
Document object (before you've even added it to the database if you
like) and iterate its termlist.  If the blacklist is long, you could
either stick its entries in a C++ std::set (or Perl hash, Python dict,
etc) at start-up, and test each document term.  Or if the blacklist
is short, you can use skip_to() on the Document's termlist to check
for blacklist terms in sorted order.

If the blacklist is to prevent indexing, this has the added benefit that
you don't need to delete the document from the database if it fails the
blacklist test.

Cheers,
    Olly



More information about the Xapian-discuss mailing list