[Xapian-discuss] Extremely high WDF's

David Morris-Oliveros dmorris at sirca.org.au
Wed Jun 27 07:48:34 BST 2007


Hi, I'm thinking of using the WDF of terms for something other than 
actual "frequency".

I have to index some pages over time. The actual content of the document 
isn't really that important, but it needs to find a term that may have 
only appeared in a 1minute interval over the 10-year life of the document.

 So I've devised a way to just extract terms, and associated "life" of 
that term, it could be contiguous, it could be popping in and out all 
the time.

Now I want to use the WDF to give more weight to terms that have 
appeared on that page throughout the life of the document, as opposed to 
terms that only appeared briefly. I thought of adding all the seconds 
that the term has appeared on that page, and that could be its WDF.

However, this would give me WDF's well into the millions.

Since it's already been more than 24hours since my last ludicrous idea, 
I'd thought it would be time for another one.

Plan B: normalize the time from 1..N where N is the number of terms that 
have ever appeared on the page and then just assign each term its order 
in that range.



More information about the Xapian-discuss mailing list