[Xapian-discuss] UTF8 support plans (without stemming)

Olly Betts olly at survex.com
Thu Apr 28 14:05:45 BST 2005


On Thu, Apr 28, 2005 at 01:50:09PM +0100, James Aylett wrote:
> (I was confused a little by the fact that Xapian
> takes a std::string, but actually thinks of it as a byte array -
> std::string is string<char> isn't it? Where a byte array would be
> string<unsigned char>, if you were to use STL's string.)

It's std::string all the way down to the Btree manager these days, at
which point there's a byte typedef for unsigned char which is used quite
a bit.

I suspect that's largely because Martin originally wrote the Btree
manager and he's still a BCPL programmer at heart!  Explicitly using
unsigned char does help avoid problems with different behaviour when
char is signed or unsigned (ANSI C allows an implementation to pick
either).  But I think we now always use memcmp() to compare keys anyway.

Cheers,
    Olly



More information about the Xapian-discuss mailing list