[Xapian-discuss] [ NUMBER OF SAMPLE ]

Boris Meyer boris.meyer at rom.fr
Wed Jul 21 17:59:26 BST 2004


Hello Eric, Hello Richard,

Eric B. Ridge wrote:

> On Jul 21, 2004, at 10:59 AM, Boris Meyer wrote:
> 
> <snip>
> 
>> The solution could be the retrieving of the words/phrases offset in 
>> the document and the extraction from this offset with a fork (x char 
>> before/x after) in combination with a document local weight algorythm 
>> if more than one match in the same document.
> 
> It sounds like you want some bit of context around the first hit.

Exactly. More precisely a meaningful return result.

> don't know if Omega can do this (doubt it, but I've never used Omega).  
> Personally, I'd like to see support for this in Xapian's API.

I'm diving into the Api, looking for some methods to retrieve this offset.

> Right now one must re-parse the document, joining up with the terms list 
> from the result to find and highlight any/all hits, let alone context 
> extraction.  A fairly expensive operation if you're doing to do this on 
> a "summary display" of many documents.

Yes a very consuming process, especially when the average size of the 
documents I would have to parse is known, 3Mo (Pdf), don't forget the 
x10 results/page please ;-).

> I think I suggested awhile back that Xapian be able to track byte 
> offsets for each term.  This would make grabbing hit contexts really 
> simple.  I know it would drastically increase the size of the index, but 
> I personally would be willing to take the storage hit.

As HD are now low cost and as everybody today is looking for a google 
meaninful result listing with highlighted terms, I would also store a 
such index. But maybe is there another way ?

> eric

-- 
Cordialement, Boris.
+---------------------------+----------------------+
| Boris Meyer               | Tel : 04 93 92 88 88 |
| Administration / Internet | Fax : 04 93 92 18 93 |
| Developpement             | Web : http://rom.fr  |
+---------------------------+----------------------+
| 19, bd Carabacel          | - - - - - x - - - -  |
| 06000 Nice                | - - - - - x - - - -  |
+---------------------------+----------------------+
| boris.meyer at rom.fr        | http://www.rom.fr    |
+---------------------------+----------------------+



More information about the Xapian-discuss mailing list