[Xapian-discuss] search queries with less than 3 characters, memory goes nuts

chris chris at s-4-u.net
Sat Aug 15 11:58:53 BST 2009


Good morning list,

we're evaluating xapian by using it with acts_as_xapian and ruby since
around 2 months and it is really a great piece of software, big thank
for giving us such a high quality turbo finder.

But we're facing problem with queries like:  Top+schwarz+40

As soon as mongrel hands over the query to xapian, memory usage of the
webserver-process goes up 'till the box runs out of ram and if i
give the box 50GB swap, it'll eat them too.

I could narrow the problem down to queries that contain parts, which are
less than 3 characters. if no such queries come in, the webserverprocess
will never need more than 100mb no matter how complicated the query is
or how long it is running.

The behaviour is not 100% consistent, sometimes such queries just
take "a few gigabytes" and even return results. But the webservers will
still not free the used memory, which is why they eat up all ram after
a few of these queries anyway.


i dont understand this behaviour and more important, i dont know what to
do against it, as its surprisingly difficult to prepare the query
before handing it over to xapian, because of combinations like '+test
-"abcdef+1-2"'.

It also seems a bit too redundant to clean the query at all, as
xapian is most surely doing this much better than i could.

So my questions are:
- why does xapian use countless gigabytes of ram if i feed it such
a query?

- is there a need to clean the query before? i mean, could someone do
  something nasty with it? (except the usual html-security things,
  which we take care of by escaping the query before display)

- what can i do to prevent this? 

I'm thankful for any suggestions, ideas or even a 'finished'
solution ;)

Greets, Chris

PS. We're using v1.0.12 on linux and the index has ~3mio documents
with around 1k text each.








More information about the Xapian-discuss mailing list