[Xapian-discuss] Problems with positions and replace_document

Fernando Nemec fernando.nemec at folha.com.br
Tue Nov 14 16:59:06 GMT 2006


Hi Olly,

> Phew!  That's hard to explain in words, but it's actually pretty
> straightforward.  Easy than I expected it to be anyway.

hehe, I think I got it. :)

> I think the above is probably both the simplest and fastest way. If
> you want to try implementing it, that'd be cool. Otherwise I'll have
> a look once I've sorted out the things I'm currently working on.

I'm going to take a chance digging deeper on Xapian black magic
internals. I let you know.

Thanks,

Fernando




Tuesday, November 14, 2006, 12:16:36 AM, you wrote:

> On Mon, Nov 13, 2006 at 10:37:20AM -0200, Fernando Nemec wrote:
>> I'm glad to help. If we could have a way to check if the doc already
>> has a docid... But as far as I dig into the code, a document alone
>> doesn't know his own docdi, is that right?

> The Xapian::Document::Internal class which is the actual implementation
> knows the docid and Database::Internal* (if the document came from a
> database).

>> I was wondering if I can use docid to bring a new instance of a the
>> document and, as new documents use reference count, compare this
>> instance with the one supplied in the argument list. This way, I
>> think, the method knows if that's a replace or a update operation. The
>> problem is I don't know how expensive is to do such operation,

> Xapian::Document is reference counted, but you'll get two different
> underlying objects if you call Xapian::Database::get_document() twice on
> the same database, even if the docids are the same.

> But it's actually easier than that...

> You need to compare the Database::Internal pointer and also the docid
> (since it's legal to read a Document from one database and write it back
> to another).  If those match, that should be enough to know that any
> parts of the document (values, postings, document data) which haven't been
> modified don't need to be rewritten (if terms_here is false, the terms
> are unmodified; similarly for values_here and data_here).

> If the Document isn't associated with a database, then the "database"
> pointer will be NULL and so will never match the Database::Internal
> pointer for the database the document is being added to.

> So I think you just need a new method (or maybe 4 new methods) to 
> Document::Internal which check if this document is replacing itself
> and indicate which parts are modified.  Then we can call these when
> handling replace_document to find out what we actually need to change.

> Phew!  That's hard to explain in words, but it's actually pretty
> straightforward.  Easy than I expected it to be anyway.

>> If you want me to, perhaps I can try to think in a smarter and faster
>> way to replace positional information when the Documents involved are
>> the same.

> I think the above is probably both the simplest and fastest way.  If you
> want to try implementing it, that'd be cool.  Otherwise I'll have a look
> once I've sorted out the things I'm currently working on.

> Cheers,
>     Olly

--
[]s
Fernando Nemec
fernando.nemec at folha.com.br
http://www.folha.com.br/





More information about the Xapian-discuss mailing list