[Xapian-discuss] Xapian as documentfilter?
Arjen van der Meijden
acmmailing at tweakers.net
Mon Oct 31 09:49:56 GMT 2005
Hi list,
Currently I'm working on an application that will need both searches
through a set of documents and alerts when a new document is added which
matches some predefined set of "rules". That set of rules may be just a
stored searchquery, but can be anything I need.
For both the searches and the alerts I'd like to have the same set of
parameters, so it'd be nice to have the same engine handling everything.
Documents contain a few short indexable texts and a list of boolean
terms. Those boolean terms are in most cases n:m i.e. documents can have
multiple boolean terms with the same prefix.
Most searches will be conducted on those boolean terms, sometimes
expanded with keyword searches (and rarely with explicit operators).
For the searches through the set of documents Xapian/Omega work very
well. For the alerts on new document, I'm wondering how to do it.
The naive approach is of course to just store a list of searchqueries
that users have asked to be alerted on.
But it will likely run in hundreds of such queries, maybe even a few
thousand. Each added set of documents would than be "searched" by each
stored query, and even though that can be done quite fast (prepend
B=Q$newId1 B=Q$newId2 etc to the query) it may (will?) be too much
overhead nonetheless.
Reversing the process might be quite nice, but how to do that? The
queries should be stored as documents and the document should be "the
query". But than you lose the boolean logic and phrase operators from
the original query.
Any ideas?
Best regards,
Arjen
More information about the Xapian-discuss
mailing list