[Xapian-discuss] UTF-8 becomes glibberish in searches

Olly Betts olly at survex.com
Wed Oct 24 16:16:19 BST 2007


On Wed, Oct 24, 2007 at 02:38:51PM +0200, robert wrote:
> In your .script have you unhtml  for this field ??
>    in myhtmlparse.cc => Line 42 : charset = "ISO-8859-1";
>    all datas from this fieldd  was converted in  ISO-8859-1

The original poster has now solved their problem - they'd failed to
specify a character set by sending an HTTP "Content-Type" header from
their PHP script.

The line you indicate is just the default character set if none is
otherwise specified by an HTML document.

Files with XML declarations default to utf-8, or use the encoding
specified there, e.g.:

    <?xml version="1.0" encoding="UTF-8"?>

And we also honour "meta http-equiv", e.g.:

    <meta http-equiv="Content-Type" content="UTF-8">

Cheers,
    Olly



More information about the Xapian-discuss mailing list