[Xapian-discuss] Re: xapian's cache

Sat Nov 24 17:43:00 GMT 2007

Thank you very much James for your detailed information.

my current development config are as follow, both running CENTOS 5
machine1 , raid 0 2x SATA hard disk , 2 G ram , 3G PD   860 Dell
machine2, raid 5 3x 15k rpm SAS , 8G ram , 2X 2G Xeon 2950 dell

Do u suggest any utilities to monitor the OS filesystem caches so I can
try to monitor + tune them

Cheers
Andrey

"James Aylett" <james-xapian at tartarus.org> wrote in message 
news:20071123233258.GE25387 at tartarus.org...
> On Fri, Nov 23, 2007 at 02:25:31PM -0800, Andrey wrote:
>
>> About the "warming-up" of xapian from the first few queries, in which
>> prespective does it cache the data in?
>> xapian / xapian-binding / filesystem IO?
>
> Right now Xapian does (effectively) no explicit caching; it lets the
> operating system cache whatever it likes. This makes it difficult to
> answer most of your questions without knowing exactly what your
> operating system is (and details of how it caches). However in
> general, assuming there is enough core (physical memory) for the
> processes to never go into swap, the remaining memory will be used to
> cache blocks from the filesystem. From now on when I say 'cache' I
> mean 'operating system filesystem cache'.
>
> [Right now I'll point out that I can't remember any of the deep
> details of how flint btrees are likely to map onto disk blocks, and so
> some of this may need to be elaborated on or corrected by Olly or
> Richard.]
>
> When a Xapian writer flushes the database to disk, a number of file
> system blocks will change. How many cached blocks in the reader
> operating system become invalidated at that point will depend on
> details of your database and indexing and search profiles; you're best
> off measuring the effects of various changes here.
>
> If you have a writer using local disk and exporting to a remote reader
> (presumably using NFS), you are using the memory in the writer for two
> distinct things: caching the blocks off disk of the revision being
> used by the reader (so that requests from the reader that aren't in
> the reader's cache already will incur only the network overhead, not a
> hit to disk on the writer as well) and caching the blocks onto disk of
> the revision being assembled by the writer. (It's a little more
> complex than that because of the way revisions work, but hopefully
> that's a helpful view.)
>
> In very high performance situations, you /may/ get better mileage out
> of having the storage local to the reader, not the writer (throw lots
> of memory at the reader), or in a different box altogether (throw lots
> of memory at both reader and backend storage). However there may also
> be advantages to having the storage local to the writer (see below).
>
> Note that if your continual indexing process is 'sane' (by which I
> mean it's nowhere near intensive enough to risk getting behind - ie
> it's mostly sleeping, not actually doing work) then the memory in the
> writer isn't so important (but if the writer is also the final storage
> machine, the memory for that is important).
>
>> What happen to the cache when the DB is flush? The cache in memory
>> will gone or will incrementally added up?
>
> That depends on lots of things. Whatever has the storage local to it
> will do a pretty good job of throwing away invalidated cache blocks
> and, where necessary, reloading the freshened blocks from disk. (If
> the writer is on the same operating system instance, those freshened
> blocks are likely to already be in cache because of write-behind, in
> which case: win! Nothing has to hit disk to get them into core,
> assuming you have enough memory.)
>
> If the reader doesn't have local storage, it will have its own (now
> invalid) blocks cached. A good NFS implementation will deal with this
> fairly efficiently (NFSv4 more so than NFSv3, with the caveat that
> some NFSv4 implementations seem less stable in all sorts of nasty edge
> cases; however when you're pushing stuff that hard you're always going
> to have to do more work, so I'd ignore that for the time being). It'll
> need to go back across the network to freshen the block (assuming it
> needs that block again) or to fetch a new one (if that block is no
> longer used, which is a minor pain as it might not be invalidated if
> it's no longer used but unchanged; you can probably trust your OS to
> do the sensible thing here and just throw it away eventually in favour
> of blocks that are still being used). With luck you'll have enough
> memory on your storage box that the majority of these (ie: the most
> common blocks, ie those blocks needed for the most common searches)
> will be in core, so you won't actually hit disk there.
>
> (Some NFS implementations allow you to cache on disk, either by an
> extension layer above NFS or built into the file system implementation
> itself. The same kind of thing applies there, except that you might
> get better speed than having to do a network hit, depending on the
> relative speed of network vs local disk, and your disk loading.)
>
> It would be nice to be able to point a monitoring system at a running
> OS and figure out what's going on in its cache usage. You can get this
> kind of data to an extent on some systems, with the caveats that (a)
> it will take up memory, and so slow things down if you're running
> short on core, and (b) it will take up processor time. However, given
> a bit of time (and perhaps the risk that sometimes your system will
> respond much slower than it should as you work out the right tuning
> parameters), you can do it externally by measuring what you care about
> and tuning to improve that measurement. (This has the added advantage
> that you don't need to know intimately how your OS caches work.)
>
> The big message is: measure it, change it a bit, measure it
> again. Empirical data coming out of realistic simulated (or actual
> real live) searches and indexing using your code is the only real way
> know that you're improving things.
>
>> notice that the DB keep flushing every 10,000 doc (@5mins), will the 
>> search
>> preformance better-off if seperated to 2 DBs, and search over them like
>> this? will the cache of db1 stays and benefits?
>> db1 < very large
>> db2 < only todays document, flush every 5mins 10,000 doc
>
> Possibly, but not necessarily for caching reasons. I *think* (Olly or
> Richard should jump in here) that providing your underlying filesystem
> block size is the same as the btree block size that you won't see a
> huge amount of difference in terms of caching efficiency. You should
> get other benefits, particularly around inserting into db2 (because
> the btree isn't nearly as big).
>
>
>
> Finally, note that there are many other routes you can take. Without
> knowing anything about what scale you're trying to achieve, what your
> budget is, and so on, no one's going to be able to give you a set of
> instructions on how to build the best system for your needs. (And even
> if someone could, they'd probably want to charge you a consulting fee
> for it ;-)
>
> J
>
> -- 
> /--------------------------------------------------------------------------\
>  James Aylett                                                  xapian.org
>  james at tartarus.org                               uncertaintydivision.org