[Xapian-discuss] MatchSpy:ing on a large recordset

alexander lind malte at webstay.org
Thu May 22 20:24:50 BST 2008


On May 22, 2008, at 2:55 AM, Olly Betts wrote:

> On Wed, May 21, 2008 at 11:03:04PM -0700, alexander lind wrote:
>> I have a project in the works that will have a 10-15M records with a
>> set of arbitrary attributes on each record.
>>
>> I need to build a system where a user can filter the recordset by
>> selecting attribute values and/or negating on them, and for each
>> attribute value given, the amount of matching records needs to be
>> calculated in realtime - 1-2 seconds lookup time is acceptable.
>
> For the filtering options you describing, making each attribute a
> term prefix and filtering on those terms would be the most efficient
> approach I think.

For attributes that can be applied as values, would it be faster to  
put them in values instead?  Like for example the attribute age, which  
could be a value between 1-100.

>
>
>> Can this be achieved with Xapian and the MatchSpy functionality?
>
> You certainly could do it this way.

Do you think there is a better way to do it with Xapian?

>  If there's enough RAM to cache
> all the value data, you'll probably at least be near the performance
> target, but without trying it I couldn't say for sure.

Would it be of significant use if I had enough RAM to put the entire  
xapian index in a RAM partition?

> Using C++ here
> is likely to help - calling from C++ to a scripting language and back
> tens of millions of times will probably be a measurable overhead.

You mean for when updating the recordset here right?

Thanks
Alec



More information about the Xapian-discuss mailing list