[Xapian-discuss] [ NUMBER OF SAMPLE ]

Boris Meyer boris.meyer at rom.fr
Wed Jul 21 15:59:25 BST 2004


Hello everybody, Hello Richard,

Boris Meyer wrote:

>> There is currently no command line option which allows you to change 
>> the sample size, but it is trivial to tweak the source so that a 
>> larger sample is produced.  Look at around line 423 of omindex.cc, and 
>> change the number 300 in the lines:
> 
> I had exactly looked at this line in omindex.cc and thought the same, I 
> will try on this way.
> 
>>     if (sample.empty()) {
>>         sample = truncate_to_word(dump, 300);
>>     } else {
>>         sample = truncate_to_word(sample, 300);
>>     }
>>
>> to a larger value.  This number is the maximum size in bytes of the 
>> sample produced.
> 
> I'll recompile with a larger value, but ideally I would want a 100% sure 
> way to obtain a meaningful sample, I have to find how to get the number 
> of char of a document during the indexation process.

Done but... the result is not what we were expecting, it just increases 
the number of char displayed in the excerpt return...

The solution could be the retrieving of the words/phrases offset in the 
document and the extraction from this offset with a fork (x char 
before/x after) in combination with a document local weight algorythm if 
more than one match in the same document.

But this solution asks a very huge index (more than the sum of 
document), or maybe a combination of a short distinct word index and a 
big one containing the index more offsets and references to the document.

-- 
Cordialement, Boris.
+---------------------------+----------------------+
| Boris Meyer               | Tel : 04 93 92 88 88 |
| Administration / Internet | Fax : 04 93 92 18 93 |
| Developpement             | Web : http://rom.fr  |
+---------------------------+----------------------+
| 19, bd Carabacel          | - - - - - x - - - -  |
| 06000 Nice                | - - - - - x - - - -  |
+---------------------------+----------------------+
| boris.meyer at rom.fr        | http://www.rom.fr    |
+---------------------------+----------------------+



More information about the Xapian-discuss mailing list