I am working on creating a OSX Spotlight like application.<br><br>first task is to index fully qualified paths, I want to be able to search for filenames first as a learning exercise to learn xapian and the python bindings.
<br><br>I tried using Xapwrap by <a href="http://divmod.org">divmod.org</a>, that didn't pan out, I could not get the actual data back after a search, a search would return document uid but I never code get .get_document().get_data() to return anything.
<br><br>So I decided to just use the "raw" python bindings provided<br><br>so I tried the simpleindex and simplesearch python example programs.<br><br>I think in both cases ( xapwrap and just the default xapian ) bindings I am getting indexing to happen, but I can't really tell because I can't get any search results to confirm anything.
<br><br>When I tried with the xapian python bindings directly, I can't get the search to work. Granted the simplesearch example program is broken, so I am kind of groping in the dark on how to get the search to return a list of documents and have get_data() actually return something.
<br><br>I guess what I need is some simple example code that will allow me to do the following..<br><br>given some data like<br><br>/this/is/a/fully/qualified/path/to/a/filename<br><br>how do I create a document and add it to an index so that I can search for it by 'filename'
<br><br>this is what I am doing to create documents and add them to the index<br><br>#!/usr/bin/python<br># indexer.py<br><br>import sys<br>import xapian<br><br># setup the file to index<br>fileToIndex = sys.argv[1]<br>if len(
sys.argv) >= 3:<br> maxRecordsToIndex = int(sys.argv[2])<br>else:<br> maxRecordsToIndex = 0<br>recordCount = -1<br><br># setup the xapian database<br>try:<br> db = xapian.WritableDatabase('/tmp/index', xapian.DB_CREATE_OR_OPEN
)<br><br> # index the file<br> for line in file(fileToIndex):<br> doc = xapian.Document()<br> doc.set_data(line)<br> db.add_document(doc)<br><br> # my input file is 70GB of data, this is to make testing faster
<br> recordCount = recordCount + 1<br> if maxRecordsToIndex > -1 and recordCount >= maxRecordsToIndex:<br> break<br> elif recordCount % 1000 == 0:<br> print 'print processed %s records so far!' % recordCount
<br> print 'processed %s records' % recordCount<br><br>except Exception, e:<br> print'Exception: %s' % str(e)<br> sys.exit(1)<br><br><br>and this is what I an doing to try and get the data back from a search, the problem is I can't get it to find anything.
<br><br>Given the example data above when run: python searcher.py /tmp/index filename<br>I get 0 records found!<br><br>#!/usr/local/bin/python<br># searcher.py<br>import sys<br>import xapian<br><br>if len(sys.argv) < 3:
<br> print "usage: %s <path to database> <search terms>" % sys.argv[0]<br> sys.exit(1)<br><br>try:<br> database = xapian.Database(sys.argv[1])<br><br> enquire = xapian.Enquire(database)<br>
query = xapian.Query(sys.argv[2])<br> print "Performing query `%s'" % query.get_description()<br><br> enquire.set_query(query)<br> matches = enquire.get_mset(0, 10)<br><br> print "%i results found" %
matches.get_matches_estimated()<br> for match in matches:<br> print "ID %i %i%% [%s]" % (match[xapian.MSET_DID], match[xapian.MSET_PERCENT], match[xapian.MSET_DOCUMENT].get_data())<br><br>except Exception, e:
<br> print "Exception: %s" % str(e)<br> sys.exit(1)<br><br>