[Xapian-discuss] Remote databases slower than local?
Rocco Caputo
rcaputo at pobox.com
Wed Jun 28 05:36:48 BST 2006
On Jun 27, 2006, at 22:32, Olly Betts wrote:
> On Tue, Jun 27, 2006 at 02:06:12PM -0400, Rocco Caputo wrote:
>> Are there known cases where remote database access is 2-6 times
>> slower than local access to the same database?
>
> I'm not aware of any profiling work on the remote database since
> webtop
> were using it, which was quite some time ago.
>
> But I have a mostly complete reworked remote backend which removes
> unnecessary layers of classes in various places and simplifies how
> the matcher interfaces with the remote backend so it might be better
> for you to look at that than the current code. I'll try to sort
> out a patch so you can try it.
I'm handy with anonymous svn, if that will help. Currently I'm using
0.9.6 plus some minor changes I've already described.
>> I'm seeing cases where queries over Xapian::Remote can take several
>> times longer than identical queries run locally. I'm making sure
>> that my queries are not cached. I'm using the match set's fetch()
>> method, which has already sped up remote queries by an order of
>> magnitude.
>
> Hmm, I did some tests on my new code and found that calling fetch()
> was
> actually slower on average than not calling it, so it's currently a
> no-op pending reworking it to be a bit lazier which I think will help.
>
> I wonder if my test case is just different to yours, or if this is
> due to changed code elsewhere.
Here's an outline of my test code. It's used in both local and
remote queries.
# Stopper, stemmer and scale created once and reused
# for each query.
my $Stopper = Search::Xapian::SimpleStopper->new();
Search::Xapian::SimpleStopper::add($Stopper, $_)
foreach keys %stopwords;
my $Stemmer = Search::Xapian::Stem->new("en");
my $Scale = Search::Xapian::BM25Weight->new(
1, # k1
0, # k2
0, # k3
0, # b
0.5, # min_normlen
);
# For each query...
my $parser = Search::Xapian::QueryParser->new();
$parser->set_database($database);
$parser->set_stopper($Stopper);
$parser->set_default_op(OP_AND);
$parser->set_stemmer($Stemmer);
my $query = $parser->parse_query($string);
my $enquire = $database->enquire($query);
$enquire->set_weighting_scheme($Scale);
my $match_set = $enquire->get_mset(
$enquire_offset, $enquire_size
);
$match_set->fetch();
my $mset_iterator = $match_set->begin();
my $mset_size = $match_set->size();
while ($mset_size) {
my $document = $mset_iterator->get_document();
# Pseudocode. No Xapian is used here.
$document = do_some_stuff($document);
push @results, {
doc => $document,
rank => $mset_iterator->get_rank(),
percent => $mset_iterator->get_percent(),
};
}
continue {
$mset_iterator->inc() if --$mset_size;
}
>> Network speed doesn't seem to be the cause. Cached remote queries
>> take only 1-2 seconds longer than cached local ones.
>
> Do you have document values? The remote backend currently always
> sends
> them over with the document data so if you don't use the document
> values
> during the match or for displaying results then the remote backend
> will
> be slower especially if the values aren't cached.
No document values.
--
Rocco Caputo - rcaputo at pobox.com
More information about the Xapian-discuss
mailing list