[Xapian-discuss] Reasonable Time Expectation for Long Queries?

Josef Novak josef.robert.novak at gmail.com
Sun Apr 15 10:23:49 BST 2007


Thanks a bunch for your response, and sorry for my delayed reply.

> It seems more natural to say:
>
>     Xapian::Query query(Xapian::Query::OP_OR, string_tokens.begin(), string_tokens.end());
Thanks for the tip!  I'm still new to c++ so I don't know my way
around too well.  I appreciate all attempts to nudge me in the right
direction!

I did a bit more testing on my code, and indeed it does not seem that
query building is the problem.  It is the actually the search/enquire
time. The following snippet outputs '0' even for my longest queries
(253 terms), which seems pretty lickety-split.  The times for the
enquire sessions, however, vary between 2-7 seconds for queries longer
than about 15 terms.  There seems to be a fairly strong correlation
between the length of queries and the amount of time required to
execute them.  Below is a snippet of the code which I'm using for the
test:

time_t start, end;
start = time(NULL);

// Build the query object
Xapian::Query query(Xapian::Query::OP_OR, string_tokens.begin(),
string_tokens.end());
end = time(NULL);
cout << "In ID " << string_tokens[0] << "\tTime: " << end - start << endl;

// Give the query object to the enquire session
enquire.set_query(query);
start = time(NULL);

// Get the top 5 results of the query
Xapian::MSet matches = enquire.get_mset(0, 5);
end = time(NULL);

// Output equire time etc.
cout << "Time for enquire: " << end - start << endl;
cout << "Num terms: " << string_tokens.size() << endl;

The output for a set of test queries and their respective times to
completion is below:

 QID     Query   Enquire NumTerms
1000003 0(s)    1(s)    18
1000009 0(s)    4(s)    111
1000018 0(s)    2(s)    19
1000021 0(s)    4(s)    139
1000029 0(s)    3(s)    52
1000033 0(s)    3(s)    30
100006  0(s)    3(s)    40
1000077 0(s)    4(s)    100
1000084 0(s)    5(s)    78
1000100 0(s)    3(s)    60
1000109 0(s)    2(s)    40
1000117 0(s)    2(s)    13
1000122 0(s)    3(s)    44
1000152 0(s)    4(s)    64
1000163 0(s)    2(s)    48
100016  0(s)    6(s)    149
1000173 0(s)    4(s)    47
100017  0(s)    6(s)    253
1000188 0(s)    4(s)    55

My indexed data and queries are all in Japanese, and a typical query
(if you can see, and/or read the characters looks like:
"話題 の 人物 ミラー マン と は 、 なん です か ? エロ い ん です か ?"
roughly 'Who/What is the story character 'Mirrorman'?  Is it somehow erotic?'

The string_tokens vector is build by segmenting this input sentence at spaces.

So I guess that a better question would be, is there a way to speed up
the enquire sessions?  Or am I pretty much stuck with the above times,
meaning it's time to start looking at preprocessing approaches to
selectively trimming the set of query terms?

thanks again for your suggestions!

joe


More information about the Xapian-discuss mailing list