[Xapian-discuss] Search::Xapian and term positions
Arne Georg Gleditsch
argggh at linpro.no
Wed Jan 19 21:26:00 GMT 2005
Hi,
I'm fooling around with the Xapian engine (via the Perl modules). I'm
wondering if I can get Xapian to tell me where in a document my
queries match -- so, as a first approach I'm trying to look at the
positionlist. It's not working for me. The following snippet:
my $db = Search::Xapian::Database->new("test");
my $enq = $db->enquire("xapian");
my @matches = $enq->matches(0, 100);
foreach my $match (@matches) {
my $terms = $enq->get_matching_terms_begin($match);
my $pos = $terms->positionlist_begin();
}
bombs out. Firstly, get_matching_terms_begin1 and
get_matching_terms_begin2 seem to switched around in Enquire.pm, but
even if I rectify that things crash and burn. Under gdb:
Program received signal SIGABRT, Aborted.
[Switching to Thread -1209842944 (LWP 12984)]
0xb7e8aed9 in raise () from /lib/tls/libc.so.6
(gdb) bt
#0 0xb7e8aed9 in raise () from /lib/tls/libc.so.6
#1 0xb7f98fcc in ?? () from /lib/tls/libc.so.6
#2 0xbffff7e0 in ?? ()
#3 0xb7e8c771 in abort () from /lib/tls/libc.so.6
[..]
#43 0x080c30c6 in Perl_pp_entersub ()
#44 0xb7c2cf84 in std::terminate () from /usr/lib/libstdc++.so.5
#45 0xb7c2d0f6 in __cxa_throw () from /usr/lib/libstdc++.so.5
#46 0xb7c93015 in Xapian::TermIterator::Internal::positionlist_begin ()
from /usr/lib/libxapian.so.5
#47 0xb7d2a19d in Xapian::TermIterator::positionlist_begin ()
from /usr/lib/libxapian.so.5
#48 0xb7dc8029 in XS_Search__Xapian__TermIterator_positionlist_begin ()
from /usr/local/lib/perl/5.8.4/auto/Search/Xapian/Xapian.so
#49 0x080c30c6 in Perl_pp_entersub ()
#50 0x080bbbb9 in Perl_runops_standard ()
#51 0x080635e8 in perl_run ()
#52 0x080633f5 in perl_run ()
#53 0x0805fb9f in main ()
Has anyone seen this before? This is Xapian 0.8.5 and Search::Xapian
0.8.4.
Perhaps, before I walk this line further: is the positionlist going to
be useful for me? Not having gotten far enough to see what it looks
like, I get the impression that it is an index into the sequence of
tokens that a file is parsed to, is that correct? Can this number be
manipulated when a file is indexed, and what would be the consequence
of doing so? (I.e. letting it be <line number>*100 + <token position
in current line> or something?)
Thanks,
Arne.
More information about the Xapian-discuss
mailing list