[Xapian-discuss] BTREE_MAX_KEY_LEN=252
Josef Novak
josef.robert.novak at gmail.com
Tue May 1 04:20:08 BST 2007
Hi,
> The btree keys need to contain a term and other information, so the
> actual safe term length limit is less than 252 bytes - I recommend
> imposing a limit of 240. Read this for the full details:
That's funny, I just ran into this problem and had to set it to 240 to
get past it! Thanks though.
>
> http://article.gmane.org/gmane.comp.search.xapian.general/3656
Thanks.
>
> > I have no problems. However, I noticed that if I run the program
> > without checking the posting token length before attempting to add it,
> > it will sometimes throw the exception and keep on trucking, yet
> > sometimes it will throw the exception and then throw a segmentation
> > fault and unceremoniously die.
>
> You shouldn't get a SEGV, so that sounds like a bug. Can you provide a
> small self-contained example which demonstrates this?
I tried doing this a bunch of times while trying to figure out what
the problem was (before I realized it was the BTREE_MAX_KEY_LEN
problem). I tried taking out a large chunk of text around the
offending key (1000 documents before and after) and reindexing just
this subsection. However I can't seem to reproduce the problem. I'm
sure I have the right subsection, where the problem is occurring,
because when index the entire set, it invariably segmentation faults
at the same line. Yet, aside from the exception regarding the byte
length, the subsection seems to get indexed properly. Incidentally
this time, using the same code I posted in the previous mail, I also
got a
Exception: Error seeking to block: Invalid argument
exception. After setting the max token size to 240 I was finally able
to index the entire document set (in a pretty timely fashion to boot!)
I will continue to try and duplicate the segmentation error, in the
meantime, I have a bit of strace output from the offending process.
This was generated using the bit of code from my first mail, where I
am not checking the length of the keys:
read(4, "\0\0\0\1\0\4?\4?\0\37\36R\34\205\32u\31l\30\300\21Z\17"...,
8192) = 8192
_llseek(7, 85155840, [85155840], SEEK_SET) = 0
write(7, "\0\0\0\1\0\2S\2S\3\23\37\362\37\342\37\320\37\277\37\247"...,
8192) = 8192
_llseek(7, 84598784, [84598784], SEEK_SET) = 0
read(7, "\0\0\0\1\0\7\244\7\244\2u\37\350\37\321\37\273\37\255\37"...,
8192) = 8192
read(3, "\343\202\203\343\202\223 \343\201\235\343\202\214 \343"...,
8192) = 8192
read(3, " \343\201\202\343\202\212 \343\201\276\343\201\231 \343"...,
8192) = 8192
brk(0) = 0x82e7000
brk(0x82eb000) = 0x82eb000
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Those final 3 reads in a row look funny. Thanks again for your
comments. At the moment, with the size check in place things work.
Joe
More information about the Xapian-discuss
mailing list