[Xapian-discuss] std::string arguments presumed to be UTF8?

James Aylett james-xapian at tartarus.org
Mon Nov 14 12:20:11 GMT 2011


On 14 Nov 2011, at 11:54, Liam wrote:

> I see that TermGenerator::index_text() can take a Utf8Iterator argument,
> but Document::add_term() etc simply take a std::string.
> 
> Are std::string arguments presumed to be UTF8 strings? If "sometimes,"
> where or where not?


I believe the situation is as follows:

 * std::string should never be presumed to be UTF8. Terms, for instance, are just treated internally as byte arrays (but are commonly used to store strings, hence using std::string for convenience in C++).

 * The TermGenerator, and a few other pieces of Xapian, *do* act on UTF8, since they operate at a level that is dealing with actual characters, so there has to be a defined encoding.

Unfortunately, this isn't terribly clear from the documentation. 

J

-- 
 James Aylett
 talktorex.co.uk - xapian.org - devfort.com




More information about the Xapian-discuss mailing list