[Xapian-discuss] Problems with positions and replace_document
Fernando Nemec
fernando.nemec at folha.com.br
Tue Nov 14 16:59:06 GMT 2006
Hi Olly,
> Phew! That's hard to explain in words, but it's actually pretty
> straightforward. Easy than I expected it to be anyway.
hehe, I think I got it. :)
> I think the above is probably both the simplest and fastest way. If
> you want to try implementing it, that'd be cool. Otherwise I'll have
> a look once I've sorted out the things I'm currently working on.
I'm going to take a chance digging deeper on Xapian black magic
internals. I let you know.
Thanks,
Fernando
Tuesday, November 14, 2006, 12:16:36 AM, you wrote:
> On Mon, Nov 13, 2006 at 10:37:20AM -0200, Fernando Nemec wrote:
>> I'm glad to help. If we could have a way to check if the doc already
>> has a docid... But as far as I dig into the code, a document alone
>> doesn't know his own docdi, is that right?
> The Xapian::Document::Internal class which is the actual implementation
> knows the docid and Database::Internal* (if the document came from a
> database).
>> I was wondering if I can use docid to bring a new instance of a the
>> document and, as new documents use reference count, compare this
>> instance with the one supplied in the argument list. This way, I
>> think, the method knows if that's a replace or a update operation. The
>> problem is I don't know how expensive is to do such operation,
> Xapian::Document is reference counted, but you'll get two different
> underlying objects if you call Xapian::Database::get_document() twice on
> the same database, even if the docids are the same.
> But it's actually easier than that...
> You need to compare the Database::Internal pointer and also the docid
> (since it's legal to read a Document from one database and write it back
> to another). If those match, that should be enough to know that any
> parts of the document (values, postings, document data) which haven't been
> modified don't need to be rewritten (if terms_here is false, the terms
> are unmodified; similarly for values_here and data_here).
> If the Document isn't associated with a database, then the "database"
> pointer will be NULL and so will never match the Database::Internal
> pointer for the database the document is being added to.
> So I think you just need a new method (or maybe 4 new methods) to
> Document::Internal which check if this document is replacing itself
> and indicate which parts are modified. Then we can call these when
> handling replace_document to find out what we actually need to change.
> Phew! That's hard to explain in words, but it's actually pretty
> straightforward. Easy than I expected it to be anyway.
>> If you want me to, perhaps I can try to think in a smarter and faster
>> way to replace positional information when the Documents involved are
>> the same.
> I think the above is probably both the simplest and fastest way. If you
> want to try implementing it, that'd be cool. Otherwise I'll have a look
> once I've sorted out the things I'm currently working on.
> Cheers,
> Olly
--
[]s
Fernando Nemec
fernando.nemec at folha.com.br
http://www.folha.com.br/
More information about the Xapian-discuss
mailing list