[Xapian-discuss] UTF8 support plans (without stemming)

James Aylett james-xapian at tartarus.org
Thu Apr 28 13:50:09 BST 2005


On Thu, Apr 28, 2005 at 01:40:44PM +0100, Olly Betts wrote:

> > There may (I can't remember) be some practical issues about
> > putting NUL bytes in there
> 
> Not really.  In a term name, quartz internally encodes each zero byte
> using 2 bytes, so the maximum term length is reduced, but that's the
> only issue.  The "new quartz" won't have even that restriction.

That's kind of what I expected, but I couldn't remember how it all
fitted together. (I was confused a little by the fact that Xapian
takes a std::string, but actually thinks of it as a byte array -
std::string is string<char> isn't it? Where a byte array would be
string<unsigned char>, if you were to use STL's string.)

:-)

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list