[Xapian-discuss] Xapian performance and xapian-python

Patrick Oliver Glauner patrick.oliver.glauner at cern.ch
Mon Jun 25 09:30:10 BST 2012


Hi.

I added 400K full-texts of bibliographic records (theses, papers etc.) to a Xapian database with a total size of about 30 GB. My source code is written in Python and I use xapian-python and Xapian 1.2.5.

The test system is a Dell PowerEdge M600 0MY736 server. It has two Intel Xeon E5410 CPUs @ 2.33GHz and eight cores in total. Furthermore, it contains 16 GB RAM and two SCSI hard disks with 146 GB each. It uses Scientific Linux CERN 5 (SLC5) as operating system.

My source code is:
-------------------------------
import xapian

QUERY = '"phys rev"'
RANKED_RESULT_AMOUNT = 10


database = xapian.Database([...])
enquire = xapian.Enquire(database)
query_string = QUERY
qp = xapian.QueryParser()
stemmer = xapian.Stem("english")
qp.set_stemmer(stemmer)
qp.set_database(database)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
pattern = qp.parse_query(query_string, xapian.QueryParser.FLAG_PHRASE)
enquire.set_query(pattern)

%time matches = enquire.get_mset(0, RANKED_RESULT_AMOUNT)
-------------------------------

The output is:
CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s
Wall time: 1.99 s

Querying an equivalent Solr instances is much faster:
CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s
Wall time: 0.21 s


Question 1
How do you evaluate the Xapian wall time?

Question 2
Is there anything wrong with my source code to explain this?

Question 3
How come that the Xapian time consumption is almost independent from RANKED_RESULT_AMOUNT? If I increase it to 10000, the wall time is still nearly the same.

Question 4
How can I improve Xapian performance? Are there any configuration parameters I can use?


Thanks
Patrick

--
Patrick GLAUNER [patrick.oliver.glauner at cern.ch]

CERN
Information Technology Department
CH-1211 Geneva 23


More information about the Xapian-discuss mailing list