[Xapian-discuss] Remote databases slower than local?

Rocco Caputo rcaputo at pobox.com
Wed Jun 28 05:36:48 BST 2006


On Jun 27, 2006, at 22:32, Olly Betts wrote:

> On Tue, Jun 27, 2006 at 02:06:12PM -0400, Rocco Caputo wrote:
>> Are there known cases where remote database access is 2-6 times
>> slower than local access to the same database?
>
> I'm not aware of any profiling work on the remote database since  
> webtop
> were using it, which was quite some time ago.
>
> But I have a mostly complete reworked remote backend which removes
> unnecessary layers of classes in various places and simplifies how
> the matcher interfaces with the remote backend so it might be better
> for you to look at that than the current code.  I'll try to sort
> out a patch so you can try it.

I'm handy with anonymous svn, if that will help.  Currently I'm using  
0.9.6 plus some minor changes I've already described.

>> I'm seeing cases where queries over Xapian::Remote can take several
>> times longer than identical queries run locally.  I'm making sure
>> that my queries are not cached.  I'm using the match set's fetch()
>> method, which has already sped up remote queries by an order of
>> magnitude.
>
> Hmm, I did some tests on my new code and found that calling fetch()  
> was
> actually slower on average than not calling it, so it's currently a
> no-op pending reworking it to be a bit lazier which I think will help.
>
> I wonder if my test case is just different to yours, or if this is
> due to changed code elsewhere.

Here's an outline of my test code.  It's used in both local and  
remote queries.

   # Stopper, stemmer and scale created once and reused
   # for each query.
   my $Stopper   = Search::Xapian::SimpleStopper->new();
   Search::Xapian::SimpleStopper::add($Stopper, $_)
     foreach keys %stopwords;

   my $Stemmer = Search::Xapian::Stem->new("en");

   my $Scale = Search::Xapian::BM25Weight->new(
     1,      # k1
     0,      # k2
     0,      # k3
     0,      # b
     0.5,    # min_normlen
   );

   # For each query...
   my $parser = Search::Xapian::QueryParser->new();
   $parser->set_database($database);
   $parser->set_stopper($Stopper);
   $parser->set_default_op(OP_AND);
   $parser->set_stemmer($Stemmer);

   my $query = $parser->parse_query($string);

   my $enquire   = $database->enquire($query);
   $enquire->set_weighting_scheme($Scale);

   my $match_set = $enquire->get_mset(
     $enquire_offset, $enquire_size
   );
   $match_set->fetch();

   my $mset_iterator = $match_set->begin();
   my $mset_size     = $match_set->size();

   while ($mset_size) {
     my $document = $mset_iterator->get_document();

     # Pseudocode.  No Xapian is used here.
     $document = do_some_stuff($document);

     push @results, {
       doc     => $document,
       rank    => $mset_iterator->get_rank(),
       percent => $mset_iterator->get_percent(),
     };
   }
   continue {
     $mset_iterator->inc() if --$mset_size;
   }

>> Network speed doesn't seem to be the cause.  Cached remote queries
>> take only 1-2 seconds longer than cached local ones.
>
> Do you have document values?  The remote backend currently always  
> sends
> them over with the document data so if you don't use the document  
> values
> during the match or for displaying results then the remote backend  
> will
> be slower especially if the values aren't cached.

No document values.

-- 
Rocco Caputo - rcaputo at pobox.com





More information about the Xapian-discuss mailing list