[Xapian-discuss] Problems with positions and replace_document
Olly Betts
olly at survex.com
Sat Nov 11 11:08:16 GMT 2006
On Fri, Nov 10, 2006 at 09:18:15PM -0200, Fernando Nemec wrote:
> I dig around flint code and I have made a change that seems to fix the
> problem. At least fix the problem on my context.
Incidentally, the "add_value()" call in your example code isn't required.
Just replacing a document with itself is enough!
Thanks for the patch. I think you've probably found the right spot
(there's even a FIXME comment saying this might not work!)
It's certainly a little wasteful that we call positionlist_begin() twice
here when the document has positional information. This call could
require a reasonable amount of work when the positions come from a
database (for a new document or one with modified postings, it will be
pretty cheap and positionlist_end() is very cheap).
But I don't see that this double call could cause the problem, so I
think your patch probably just avoids the bug manifesting rather than
fixing it completely.
> In the other hand, I'm not sure if it works when the positionlist has
> more than one element -- actually I didn't test simply because I don't
> know how to put more than one position form a single term.
Call Document::add_posting() for each position of each term.
> If it doesn't work for a larger positionlist, then I think the
> approach is to make a new PositionIterator vector and copy the
> positions from term to the new vector. However, I don't know if there
> is any performance issue doing that.
The main issue is one of space. Ideally we should handle this situation
by being lazy and not updating the positional information at all!
That's not currently done because changing the values and/or document
data of an existing document without changing the indexed terms hasn't
been a performance sensitive operation for anyone. We've concentrated
our efforts on making the common cases fast.
Cheers,
Olly
More information about the Xapian-discuss
mailing list