[Xapian-discuss] Document::set_data() Limitations?
Richard Boulton
richard at lemurconsulting.com
Mon Jun 25 08:42:48 BST 2007
David wrote:
> I'm wondering if there is any limitations (hard or soft) to what you can shove
> into Document's set_data?
>
> Can I put in binary data? Or is it really just meant for text? Is there a
> practical limit to how much information we can put in there?
Binary data is fine, but the system will attempt to compress data put
into a document; this will work correctly if the data is already
compressed, but it might be worth turning off the compression to avoid
wasting CPU.
> I suspect that I'll be putting in quite a lot, as in a couple to maybe a hundred
> MB. Is this silly?
Quite possibly, though it may work.
There is an upper limit imposed on the maximum length of data which can
be stored in the document data, but it's not simple to give a value for
it. Currently, I believe the limit is:
((block_size - 19) / 4 - key_length - 7) * 65536
Where block_size is the length of a block in the database (which is 8092
by default) and key_length is the length of the key being used in the
table to look up the document data; this will usually be around 4 bytes.
This comes out to about 125 MB, so storing 2 MB is fine, but 200 Mb will
be a problem.
It's probably a mistake to try storing that much data, anyway; while it
should work, you'll end up with a single very large file in the Xapian
database directory holding the records, which might be a pain when
taking backups, etc. Also, Xapian doesn't provide you with any ability
to perform randomly access on the document data - you have to read it
all into memory to access it: if the data was stored in a file, the
operating system can access it much more efficiently.
Without knowing details of what you're trying to do, I'd probably
recommend that you store the data for each document in a separate file,
and store a pointer to the file in the document data.
> I'm still in the investigation stage, and would just like to know where my
> limits are so I can design this properly.
--
Richard
More information about the Xapian-discuss
mailing list