[Xapian-discuss] Returning "fresh" results only from multiple DBs

Henry henka at cityweb.co.za
Wed Jan 14 08:02:37 GMT 2009


Crikey, my new webmail (atmail) which I've been testing doesn't  
word-wrap at 80...  apologies for that.  Here's a repost with nice,  
fresh, newlines:


Let's say you have the following scenario:


DB1:  large corpus with rarely changing data (typically split across a  
cluster).

DB2:  small corpus with frequently changing data (to update pages in DB1).

DBn:  ditto.


Since DB1 is so large, and heavily accessed, we want to keep things simple and
foolproof, so it's contents are rarely changed, with newer, fresher, pages for
the same DB1 pages going into DB2..n.  Each duplicate page (but fresher, so
preferred) has a numeric field which increments for each refresh (1,2,3...),
which identifies the the most up-to-date page across all DBs.

How can I perform an enquiry, collapsing on a key (as currently done) to
remove duplicate pages, but yielding the freshest of those duplicate pages?

Similar to SQL:    SELECT MAX(freshness_num),*  FROM  table...


I know we can perform updates on DB1, but I don't want to go down that
path because of the volumes/sizes involved.

Any ideas?

Thanks
Henry





More information about the Xapian-discuss mailing list