[Xapian-discuss] high update-frequency strategy

jan_web at gmx.net jan_web at gmx.net
Thu Aug 13 08:22:26 BST 2009


Hi Everyone,

I'm evaluating Xapian for the following -hard- use-case:

1) document structure: avg. 100kb full-text, 5x meta-data a 100bytes, 3x
bool. flags
2) big index, i.e. full-text volume ~ 1TB/disk (2x HD, mirrored)
3) low query-frequency (<1/sec)
4) 10 inserts/sec (on a 4core host)
5) *high-update frequency of meta-data* mostly onto the bool. flags:
~20-30/sec

Requirements 3 and 4 are no problem, inserts can be cached and mostly
steered towards bulk disk I/O when the load allows for it.



The question is, if 5) can be achieved. It seems that an
	
	updateMyDoc(myDocId, meta-key, meta-value)

implementation, invariably ends up running some variation of the
following by the (Flint) backend:

	docid = query(myDocId)
	doc get_document(docid)
	// "updating" then maps to:
	* replace doc's meta-data in-memory
	* delete(mark-deleted ?) old doc in the index
	* re-insert the new doc

The last two ops work on the index cache. The bottleneck seems to be the
get_document operation which apparently causes (un-cached**) disk seeks.

**Our RAM/Disk quotient is too small for the OS disk cache to be effective.



Is there any way to make get_document "lazier" i.e. not do lookups in
the persistent index - and do the meta-date replace "dirty" i.e. simply
write the new value in the cache and don't make it persistent until
flush() ?

What are the performance dis-/advantages of modeling meta-data as
prefix-terms vs. document values ?


Did I leave out any important constraints/facts ?
Otherwise: Any help, hints, experiences would be *greatly* appreciated.


Thanks,
--jan

-- 
<html><head>
<title>DEREFER</title>
<META HTTP-EQUIV="REFRESH" CONTENT="0; URL=http://www.gmx.net/de/?status=hinweis">
</head>
<body bgcolor="#ffffff" link="#666666" vlink="#666666">
<table width="100%" height="100%" border="0"><tr><td align="center"><a href="http://www.gmx.net/de/?status=hinweis"><font face="Arial, Helvetica, sans-serif" size="2" color="#666666">Einen Moment bitte, die angeforderte Seite wird geladen...</font></a></td></tr></table>
</body></html>


Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02



More information about the Xapian-discuss mailing list