[Xapian-discuss] UTF8 support plans (without stemming)
James Aylett
james-xapian at tartarus.org
Thu Apr 28 13:50:09 BST 2005
On Thu, Apr 28, 2005 at 01:40:44PM +0100, Olly Betts wrote:
> > There may (I can't remember) be some practical issues about
> > putting NUL bytes in there
>
> Not really. In a term name, quartz internally encodes each zero byte
> using 2 bytes, so the maximum term length is reduced, but that's the
> only issue. The "new quartz" won't have even that restriction.
That's kind of what I expected, but I couldn't remember how it all
fitted together. (I was confused a little by the fact that Xapian
takes a std::string, but actually thinks of it as a byte array -
std::string is string<char> isn't it? Where a byte array would be
string<unsigned char>, if you were to use STL's string.)
:-)
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list