[Xapian-discuss] Mime2Text library, derived from omindex

Liam xapian at networkimprov.net
Sat Mar 3 22:16:06 GMT 2012


Ping...

On Thu, Feb 9, 2012 at 3:50 PM, Liam <xapian at networkimprov.net> wrote:

> On Tue, Nov 22, 2011 at 10:26 PM, Liam <xapian at networkimprov.net> wrote:
>
>>
>> load_file() in omega/loadfile.cc (part of the pending Mime2Text lib) calls
>>
>>   posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
>>
>> once, before closing the fd. In order to minimally impact the filesystem
>> cache, I suspect it should call that after each read()?
>>
>> Also, the read buffer is only 4KB. It might be considerably more
>> efficient if sized to the filesystem block size?
>>
>
> I believe doing a posix_fadvise() per-read is wise, as 100MB PDFs are not
> uncommon, and would pollute the filesystem cache. If, given the benchmarks
> below, you'd agree, I'll commit my edits to loadfile.cc and test program to
> my github branch.
>
> Here are benchmarks from a test program that walks a tree calling
>    load_file(pathname, output_string, NOCACHE | NOATIME)
> test machine is a Core 2 Duo with low-end disk, Linux kernel
> 2.6.32-32-generic
> Note: the pattern of alternating slower/faster runs repeats over many tries
>
>
> Current loadfile.cc, with 4K buffer
>   buffers of 8K 16K 32K 64K showed only a 1-2s speedup
>
> $ time ./loadfile-test ~
> total bytes read: 627344268
>
> real    0m55.267s
> user    0m0.424s
> sys    0m2.504s
>
> $ time ./loadfile-test ~
> total bytes read: 627344268
>
> real    0m18.937s
> user    0m0.360s
> sys    0m1.800s
>
> ------------
>
> Moved posix_fadvise() into the read loop
>   the faster pass is somewhat slower than before, tho only the first is
> relevant here
>
> $ time ./loadfile-test ~
> total bytes read: 627344302
>
> real    0m59.410s
> user    0m0.532s
> sys    0m2.696s
>
> $ time ./loadfile-test ~
> total bytes read: 627344302
>
> real    0m42.393s
> user    0m0.428s
> sys    0m2.376s
>
> ------------
>
> Increased the read() buffer to 32K to reduce the number of posix_fadvise()
> calls
>
> $ time ./loadfile-test ~
> total bytes read: 627344305
>
> real    0m56.894s
> user    0m0.472s
> sys    0m2.300s
>
> $ time ./loadfile-test ~
> total bytes read: 627344305
>
> real    0m41.719s
> user    0m0.408s
> sys    0m1.948s
>
>
>


More information about the Xapian-discuss mailing list