[Xapian-discuss] MatchSpy:ing on a large recordset

Olly Betts olly at survex.com
Thu May 22 10:55:08 BST 2008


On Wed, May 21, 2008 at 11:03:04PM -0700, alexander lind wrote:
> I have a project in the works that will have a 10-15M records with a  
> set of arbitrary attributes on each record.
> 
> I need to build a system where a user can filter the recordset by  
> selecting attribute values and/or negating on them, and for each  
> attribute value given, the amount of matching records needs to be  
> calculated in realtime - 1-2 seconds lookup time is acceptable.

For the filtering options you describing, making each attribute a
term prefix and filtering on those terms would be the most efficient
approach I think.

> Can this be achieved with Xapian and the MatchSpy functionality?

You certainly could do it this way.  If there's enough RAM to cache
all the value data, you'll probably at least be near the performance
target, but without trying it I couldn't say for sure.  Using C++ here
is likely to help - calling from C++ to a scripting language and back
tens of millions of times will probably be a measurable overhead.

Cheers,
    Olly



More information about the Xapian-discuss mailing list