[Xapian-discuss] Search::Xapian and term positions

Arne Georg Gleditsch argggh at linpro.no
Wed Jan 19 21:26:00 GMT 2005


I'm fooling around with the Xapian engine (via the Perl modules).  I'm
wondering if I can get Xapian to tell me where in a document my
queries match -- so, as a first approach I'm trying to look at the
positionlist.  It's not working for me.  The following snippet:

  my $db = Search::Xapian::Database->new("test");
  my $enq = $db->enquire("xapian");

  my @matches = $enq->matches(0, 100);
  foreach my $match (@matches) {
      my $terms = $enq->get_matching_terms_begin($match);
      my $pos = $terms->positionlist_begin();

bombs out.  Firstly, get_matching_terms_begin1 and
get_matching_terms_begin2 seem to switched around in Enquire.pm, but
even if I rectify that things crash and burn.  Under gdb:

  Program received signal SIGABRT, Aborted.
  [Switching to Thread -1209842944 (LWP 12984)]
  0xb7e8aed9 in raise () from /lib/tls/libc.so.6
  (gdb) bt
  #0  0xb7e8aed9 in raise () from /lib/tls/libc.so.6
  #1  0xb7f98fcc in ?? () from /lib/tls/libc.so.6
  #2  0xbffff7e0 in ?? ()
  #3  0xb7e8c771 in abort () from /lib/tls/libc.so.6
  #43 0x080c30c6 in Perl_pp_entersub ()
  #44 0xb7c2cf84 in std::terminate () from /usr/lib/libstdc++.so.5
  #45 0xb7c2d0f6 in __cxa_throw () from /usr/lib/libstdc++.so.5
  #46 0xb7c93015 in Xapian::TermIterator::Internal::positionlist_begin ()
     from /usr/lib/libxapian.so.5
  #47 0xb7d2a19d in Xapian::TermIterator::positionlist_begin ()
     from /usr/lib/libxapian.so.5
  #48 0xb7dc8029 in XS_Search__Xapian__TermIterator_positionlist_begin ()
     from /usr/local/lib/perl/5.8.4/auto/Search/Xapian/Xapian.so
  #49 0x080c30c6 in Perl_pp_entersub ()
  #50 0x080bbbb9 in Perl_runops_standard ()
  #51 0x080635e8 in perl_run ()
  #52 0x080633f5 in perl_run ()
  #53 0x0805fb9f in main ()

Has anyone seen this before?  This is Xapian 0.8.5 and Search::Xapian

Perhaps, before I walk this line further: is the positionlist going to
be useful for me?  Not having gotten far enough to see what it looks
like, I get the impression that it is an index into the sequence of
tokens that a file is parsed to, is that correct?  Can this number be
manipulated when a file is indexed, and what would be the consequence
of doing so?  (I.e. letting it be <line number>*100 + <token position
in current line> or something?)



More information about the Xapian-discuss mailing list