[Xapian-discuss] Search and batch updates

Olly Betts olly at survex.com
Sat Aug 20 17:51:11 BST 2005

On Fri, Aug 12, 2005 at 01:28:58PM +0200, Sebastjan Trepca wrote:
> I will be using xapian to index mailboxes and the first problem is
> that I will have to index headers somehow. As I read from previous
> messages the best way is to create some unique terms like
> "from::hehe at hehe.net" and then index that. But what if I have a query
> that wants all messages that has word "hehe" in from header?
> Searching by "from::mirko" doesn't get any results, using wildcards
> doesnt help either.

If you generate suitable "from::"-prefixed terms this will work.

So "From: olly at survex.com" might produce from::olly from::survex from::com
and from::olly at survex.com.

Incidentally, the convention is to use capital letters as prefixes (as
Omega does) but nothing in the core library forces you to do this - it
makes interworking with Omega much easier though.  The QueryParser class
has a small amount of special handling for capitalised prefixes, but
should work with any prefix I think.

> I will be syncing mailbox with xapian index so I will try to use its
> batching mechanism using flush() etc. I'm just wondering if anyone has
> any experience and tips about handling this problem using xapian. I
> will probably just call flush() on some delay.

I'd suggest a fairly simple approach - decide on an acceptable delay
before a message becomes searchable, and then flush() if you're idle and
haven't flushed for that length of time since adding the first message
of a batch.

Xapian will auto-flush periodically anyway unless you stop it from doing
so, but you can probably ignore that in tracking when you think you need
to flush.


More information about the Xapian-discuss mailing list