[Xapian-discuss] Counting and statistics

Olly Betts olly at survex.com
Wed Mar 28 19:30:08 BST 2007


On Wed, Mar 28, 2007 at 01:00:27PM +0200, Andreas Marienborg wrote:
> I have managed to get some sort of result, by adding every doc in the  
> RSet, then using that to build an ESet.

If you mean you're adding every document in the database to the RSet,
that doesn't really achieve anything - the terms are generated by
looking at differences between "RSet" documents and the collection as
a whole, so if the RSet and the collection are the same, you won't get
good results!

> Is there any way to "skip" some terms when building the ESet? I tried  
> with:
> 
> 	my $eset = $enquire->get_eset(10, $rset, sub { my $term = shift;  
> warn "in decider!"; return 1; });
> 
> but that just gives me the following error upon execution:
> 
> 	Usage: Search::Xapian::Enquire::get_eset(THIS, maxitems, rset) at ./ 
> script/nyheter_search_word_count.pl line 74.

ExpandDecider isn't wrapped by Search::Xapian yet.  The wrapper should
be very similar to that for MatchDecider, which was wrapped as of
0.9.10.0, so if you know any XS you could probably add a wrapper easily
enough.  Otherwise, feel free to file a bug and I'll take a look once
Xapian 1.0 is taken care of.

> Also, on an ESetIterator, it is not possible to get the number of  
> occurances, or number of documents containing it, just the weight?  

You can call Database::get_collectionfreq() and Database::get_termfreq() 
with the termname to find these out.  They aren't stored in the ESet
though.

> Where can I read about how this weight is calculated?

http://www.xapian.org/docs/intro_ir.html

especially the section:

    Using the weights: the E set

Cheers,
    Olly



More information about the Xapian-discuss mailing list