[Xapian-discuss] Counting and statistics
Andreas Marienborg
andreas at startsiden.no
Thu Mar 29 08:31:04 BST 2007
On Mar 28, 2007, at 8:30 PM, Olly Betts wrote:
> On Wed, Mar 28, 2007 at 01:00:27PM +0200, Andreas Marienborg wrote:
>> I have managed to get some sort of result, by adding every doc in the
>> RSet, then using that to build an ESet.
>
> If you mean you're adding every document in the database to the RSet,
> that doesn't really achieve anything - the terms are generated by
> looking at differences between "RSet" documents and the collection as
> a whole, so if the RSet and the collection are the same, you won't get
> good results!
>
Well, I am not adding everything form the database, just everything
from the last month (or whatever I choose to look at)
I am trying to figure out terms that are "popular" within a given set
of documents.
>> Is there any way to "skip" some terms when building the ESet? I tried
>> with:
>>
>> my $eset = $enquire->get_eset(10, $rset, sub { my $term = shift;
>> warn "in decider!"; return 1; });
>>
>> but that just gives me the following error upon execution:
>>
>> Usage: Search::Xapian::Enquire::get_eset(THIS, maxitems, rset) at ./
>> script/nyheter_search_word_count.pl line 74.
>
> ExpandDecider isn't wrapped by Search::Xapian yet. The wrapper should
> be very similar to that for MatchDecider, which was wrapped as of
> 0.9.10.0, so if you know any XS you could probably add a wrapper
> easily
> enough. Otherwise, feel free to file a bug and I'll take a look once
> Xapian 1.0 is taken care of.
>
I will see what I can do. I haven't done any XS, but I suppose its
never to late to learn :)
>> Also, on an ESetIterator, it is not possible to get the number of
>> occurances, or number of documents containing it, just the weight?
>
> You can call Database::get_collectionfreq() and
> Database::get_termfreq()
> with the termname to find these out. They aren't stored in the ESet
> though.
>
But will theese work on a set, or the complete database? I want to
know how many times a term occured within a given searchresult.
>> Where can I read about how this weight is calculated?
>
> http://www.xapian.org/docs/intro_ir.html
>
> especially the section:
>
> Using the weights: the E set
>
Ok, will look into it, thanks :)
- andreas
More information about the Xapian-discuss
mailing list