[Xapian-discuss] UTF-8 becomes glibberish in searches
robert
robert at weborama.fr
Wed Oct 24 13:38:51 BST 2007
Olly Betts a écrit :
> On Thu, Oct 18, 2007 at 12:47:21PM -0700, athlon athlonf wrote:
>
>> I'm using dbi2omega and scriptindex to index a database with chinese
>> characters.
>> Searches are done with php4-bindings.
>>
>> While the index-file is in utf8, the results from the searches are
>> glibberish.
>>
>> These characters (changed to htmlencoding for this message)
>> ?????? becomes something like this: å??äº???ä¸
>>
>
> I just see "?" and inverse "?" here in mutt I'm afraid...
>
>
>> What am I doing wrong here? Is it the indexing, or is it the searching?
>>
>
> You need to step through the process, checking that everything is OK
> after each step. It could be dbi2omega is wrong, or scriptindex, or
> xapian itself, or the PHP bindings.
>
> First of all, I'd run dbi2omega redirected to a file, and then see if
> the UTF-8 is correct in that file.
>
>
>> How can I check if the database is indeed in utf-8?
>>
>
> Use the "delve" utility (in xapian-core, examples/delve) to look at the
> terms for a few documents.
>
> If both dbi2omega and the database look OK, then it's probably the PHP
> bindings. If you're writing the results as a web page, have you set
> the character set of the webpage to UTF-8 correctly? Check what your
> web browser says its character set is.
>
> Cheers,
> Olly
>
In your .script have you unhtml for this field ??
in myhtmlparse.cc => Line 42 : charset = "ISO-8859-1";
all datas from this fieldd was converted in ISO-8859-1
> ( excuse my poor english i'm French )
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list