[Xapian-discuss] Flint Backend
Arjen van der Meijden
acmmailing at tweakers.net
Thu Jun 23 15:13:23 BST 2005
Olly Betts wrote:
> On Thu, Jun 23, 2005 at 08:25:31AM +0200, Arjen van der Meijden wrote:
>
>>The quartzcompact doesn't do that much for the position-table it goes
>>from 7.8GB (a db that is in use for quite some time now) to 7.0GB, which
>>is about 11% (actually more than I thought I'd know).
>>Of course I can't tell which is overhead generated due to it being in
>>long use and what is actual compaction-gain.
>
> You can use "quartzcompact -n" to compact but not do tag splitting to
> fill blocks fuller (and "quartzcompact -F" to generate larger than
> normal tag chunks and reduce size further, but the I'd not recommend
> using this if you plan to update the compacted database again).
>
> The difference between "quartzcompact -n" and "quartzcompact" (or the
> extra gain from running "quartzcompact" after "quartzcompact -n") is
> probably what you're thinking of as the "actual compaction-gain".
We don't update the compacted database, if that should happen it wouldd
be an emergency situation in which case we'd problably just rebuild the
entire index from scratch.
Will the -n and -F work for other tables than position as well?
>>Will this give useable figures if I'd use the current flint-backend, or
>>are the bugs you found such that especially the size of the index is
>>negatively influenced?
>
>
> With 0.9.1, you can't open a flint index for reading. Also the
> positionlist packing missed out some information necessary to actually
> unpack the list again, so the size will be slightly underestimated if
> anything.
>
> If you want to try flint, it's probably best to use a snapshot from SVN.
> This also has the new "xapian-compact" which is like quartzcompact but
> for flint databases.
I've installed a SVN-snapshot from this afternoon and I'm downloading a
today's (compacted) index.
> I've now written "flintcompact" (but called it "xapian-compact" with
> an eye to the future!)
Great, I'll test it tomorrow as well on our database then.
>>We have about 1M documents indeed, but that takes up much more than the
>>4GB of memory the production machine has I guess. You can see above what
>>size our position-table is. Development-machines here 'only' have 1GB.
>
> You probably don't want to use XAPIAN_FLUSH_THRESHOLD=1000000 then,
> especially as your documents are large. Hopefully I can make this
> parameter self-tuning (and also greatly reduce the space needed for
> buffering).
The advantage of the ability to specify such a variable yourself is that
you can depend on it. In our case we keep a counter which document was
last indexed/updated (and its last update time). But it's not that handy
to do that if you can't predict how much documents scriptindex will
actually process. For (quartz|xapian)compact it doesn't matter though,
that needs to finish or its work is kinda useless.
Best regards,
Arjen
More information about the Xapian-discuss
mailing list