[Xapian-discuss] Problem getting Xapian working with Burmese

Emmanuel Engelhart emmanuel at engelhart.org
Fri Jul 17 18:30:43 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I use Xapian in my project with multiple latin languages and it works
good. I have also tried with Parsi, and it looks to work too.

But, with Burmese, this is a little bit different. What I do:

mkdir html
cd html
wget -O doc.html http://my.wikipedia.org
cd ..
omindex --db=./xapdb ./html/

To make a simple search in the db I use the following Perl script (my
code is in C++ and it does not work too):

===================================================================
#!/usr/bin/perl

use Search::Xapian;
use utf8;

my $db = Search::Xapian::Database->new( './xapdb' );
my $enq = $db->enquire( $ARGV[0] );

printf "Running query '%s'\n", $enq->get_query()->get_description();

my @matches = $enq->matches(0, 10);

print scalar(@matches) . " results found\n";

foreach my $match ( @matches ) {
    my $doc = $match->get_document();
    printf "ID %d %d%% [ %s ]\n", $match->get_docid(),
$match->get_percent(), $doc->get_data();
}
===================================================================

./search.pl problems

... returns the document, because you have at the beginning of the page
a sentence in English with this word inside.

./search.pl ၁၂၆၆

... return a result too.

./search ဝီကီပိဒိယအကြောင်း
./search ဗဟိုစာမျက်နှာ

... do not work... in fact it does not work most of the time. I seems to
work only with Burmese words wich are short and/or only with certain
characters.

Is that normal?

Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpgtUAACgkQn3IpJRpNWtPRRgCfZukUGfG8Eliv6SKZDXoAWnlI
SP8Animz/5IUtSl9Ba2oV8vJLkjdLcDX
=QjZX
-----END PGP SIGNATURE-----



More information about the Xapian-discuss mailing list