[Xapian-discuss] Counting and statistics
Olly Betts
olly at survex.com
Wed Mar 28 19:30:08 BST 2007
On Wed, Mar 28, 2007 at 01:00:27PM +0200, Andreas Marienborg wrote:
> I have managed to get some sort of result, by adding every doc in the
> RSet, then using that to build an ESet.
If you mean you're adding every document in the database to the RSet,
that doesn't really achieve anything - the terms are generated by
looking at differences between "RSet" documents and the collection as
a whole, so if the RSet and the collection are the same, you won't get
good results!
> Is there any way to "skip" some terms when building the ESet? I tried
> with:
>
> my $eset = $enquire->get_eset(10, $rset, sub { my $term = shift;
> warn "in decider!"; return 1; });
>
> but that just gives me the following error upon execution:
>
> Usage: Search::Xapian::Enquire::get_eset(THIS, maxitems, rset) at ./
> script/nyheter_search_word_count.pl line 74.
ExpandDecider isn't wrapped by Search::Xapian yet. The wrapper should
be very similar to that for MatchDecider, which was wrapped as of
0.9.10.0, so if you know any XS you could probably add a wrapper easily
enough. Otherwise, feel free to file a bug and I'll take a look once
Xapian 1.0 is taken care of.
> Also, on an ESetIterator, it is not possible to get the number of
> occurances, or number of documents containing it, just the weight?
You can call Database::get_collectionfreq() and Database::get_termfreq()
with the termname to find these out. They aren't stored in the ESet
though.
> Where can I read about how this weight is calculated?
http://www.xapian.org/docs/intro_ir.html
especially the section:
Using the weights: the E set
Cheers,
Olly
More information about the Xapian-discuss
mailing list