[Xapian-discuss] A few questions wrt Xapian
Henka
henka at cityweb.co.za
Mon Nov 3 10:09:54 GMT 2008
Greetings all,
I'm about to evaluate Xapian for a future project and would appreciate
a few comments from those in the know:
Indexing
1. Is Xapian similar to Lucene in the sense that you can define as
many fields as you want, and assign various weights (which influence
search result sorting) to these fields? I gather from the docs that
you can, but I just need confirmation.
2. Let's say you're indexing websites; can you then merge/combine
many smaller indexes into larger ones for later searching?
Searching
1. I gather from the docs that you can sort results according to your
own field/s, followed by the default document scoring (think
"page-rank"). Correct?
2. ~/docs/remote.htm mentions distributed searching - we want to
spread the search load around our cluster by splitting the index into
many manageable-sized indexes (to ensure sub-second performance), with
a "master" node which combines search results and end-users see. Is
my understanding correct and are there any pitfalls/bottlenecks?
3. Removing duplicates: this can be done programmatically I know
(but is slow on our chosen platform - Perl), but does Xapian provide
this mechanism built-in? For example: a search result might return
several pages from a web site, but we want to remove these dups and
only provide a single result (highest ranking) per website (eg, with a
link for "More from this site..." - al-la Google, which will be a
separate search displaying all the site-duplicates).
4. If the mechanism to remove duplicates exists, will this still work
cluster-wide in distributed searching?
5. Does Xapian provide a mechanism for identifying the actual field
in a search result which triggered the hit? eg, let's say you have
TITLE, BODY, OTHER as fields in your index. If a search found your
term in the BODY field, does Xapian provide this as feedback?
5. This is difficult I know: how does Xapian compare
performance-wise? Has anyone done any basic benchmarking?
Thanks for any information you can provide.
Regards
Henry
More information about the Xapian-discuss
mailing list