[Xapian-discuss] Search and batch updates

Olly Betts olly at survex.com
Sat Aug 20 17:51:11 BST 2005


On Fri, Aug 12, 2005 at 01:28:58PM +0200, Sebastjan Trepca wrote:
> I will be using xapian to index mailboxes and the first problem is
> that I will have to index headers somehow. As I read from previous
> messages the best way is to create some unique terms like
> "from::hehe at hehe.net" and then index that. But what if I have a query
> that wants all messages that has word "hehe" in from header?
> Searching by "from::mirko" doesn't get any results, using wildcards
> doesnt help either.

If you generate suitable "from::"-prefixed terms this will work.

So "From: olly at survex.com" might produce from::olly from::survex from::com
and from::olly at survex.com.

Incidentally, the convention is to use capital letters as prefixes (as
Omega does) but nothing in the core library forces you to do this - it
makes interworking with Omega much easier though.  The QueryParser class
has a small amount of special handling for capitalised prefixes, but
should work with any prefix I think.

> I will be syncing mailbox with xapian index so I will try to use its
> batching mechanism using flush() etc. I'm just wondering if anyone has
> any experience and tips about handling this problem using xapian. I
> will probably just call flush() on some delay.

I'd suggest a fairly simple approach - decide on an acceptable delay
before a message becomes searchable, and then flush() if you're idle and
haven't flushed for that length of time since adding the first message
of a batch.

Xapian will auto-flush periodically anyway unless you stop it from doing
so, but you can probably ignore that in tracking when you think you need
to flush.

Cheers,
    Olly



More information about the Xapian-discuss mailing list