[Xapian-discuss] Flint Backend
Olly Betts
olly at survex.com
Sat Jun 11 14:19:59 BST 2005
On Sat, Jun 11, 2005 at 11:53:05AM +0200, Arjen van der Meijden wrote:
> It looks good, but I have a few questions. First off, you talk about
> improved performance for adding documents, did this effort have any
> effect on updating (replacing) documents?
It should speed that up too, at least eventually.
More compact representations will tend to be a win all round. There are
a few things we can do to improve updates - we can be smarter about not
updating information about an updated document which doesn't change (for
example, many of the posting entries will often be unchanged; perhaps
values too).
There's probably quite a lot more scope for speeding up the case of
adding documents in large batches though (and that's typically where the
speed matters most).
> Last nights run took about 31 minutes for indexing, so if that would've
> taken like 32 (or even 40) minutes with Flint, it'd be no problem at all
> though.
I don't think you need to worry then.
> Do you think the Flint backend will be better in terms of performance
> compared to (our setup with) a (zlib) compacted quartz database? Or is
> it too early in its development stage and should we wait a while for
> such things to become clear?
It'll include similar zlib compression, and a "flintcompact". I've just
realised I'd better write that now actually, as I'm currently building
about 60 small flint databases for gmane, which I need to merge...
> As you know especially the position-table puts a lot of pressure on our
> machine, so significant improvements in that table are very interesting
> for us.
I've only run one example through it so far, which was artificial data.
Also I don't have a "flintcompact" yet. So it's not totally easy to
compare but the uncompacted flint position table was about 15% smaller
than the compacted quartz one (if I remember correctly). However flint
does a better job of being compact to start with.
I'm certainly interested to hear results of converting real-world
databases to flint (especially on positionlist table size). You can
do this like so (assuming sh, bash, zsh or similar):
XAPIAN_PREFER_FLINT=1 XAPIAN_FLUSH_THRESHOLD=1000000 copydatabase <qdir> <fdir>
Where <qdir> is the existing Quartz database and <fdir> is the directory
to create the flint database in.
Reduce 1000000 if you don't have loads of memory. If this number is
more than the number of documents, you'll get something roughly
equivalent to what "flintcompact -n" would give, if flintcompact
existed!
But beware that copydatabase is inherently a lot slower than
quartzcompact because copydatabase reinverts the data whereas quartzcompact
copies the already inverted data.
Cheers,
Olly
More information about the Xapian-discuss
mailing list