[Xapian-discuss] Xapian support for huge data sets?

Charlie Hull charlie at juggler.net
Fri May 13 09:57:59 BST 2011


On 12/05/2011 19:18, Bill Hendrickson wrote:
> Hello,
>
> I’m currently using another open source search engine/indexer and am
> having performance issues, which brought me to learn about Xapian.  We
> have approximately 350 million docs/10TB data that doubles every 3
> years.  The data mostly consists of Oracle DB records, webpage-ish
> files (HTML/XML, etc.) and office-type docs (doc, pdf, etc.).  There
> are anywhere from 2 to 4 dozen users on the system at any one time.
> The indexing server has upwards of 28GB memory, but even then, it gets
> extremely taxed, and will only get worse.
>
> In the opinion of this list, would Xapian be able to handle this kind
> of load, or should I evaluate more “enterprise”-like solutions (GSA,
> etc.)?

Xapian was originally written to power the Webtop web search engine, 
which indexed around 500 million pages on a farm of around 30 servers, 
back in 1999 or so. We've built 100m page indexes for clients. You 
shouldn't have any trouble indexing your content given sufficient 
hardware, arranged in the right way - a single server is probably not 
enough though!

Cheers

Charlie
www.flax.co.uk





More information about the Xapian-discuss mailing list