[Snowball-discuss] a simple algorithm problem
James Aylett
james at tartarus.org
Thu Jan 6 09:29:07 GMT 2005
On Thu, Jan 06, 2005 at 09:12:34AM +0000, Martin Porter wrote:
> So one idea is to declare 'utf8' in the Snowball script, allowing character
> defs in the range 0-64K, as in the 2-byte character version. Characters
> could be written with their Unicode values.
Presumably this still restricts Snowball to code points in the BMP? Or
does it just restrict it to recognising and doing things with
characters at code points in the BMP, passing through any others?
There's not a huge amount outside it yet, so this may not matter at
all.
> and encoded in utf-8 form in strings.
What's the character encoding of snowball scripts at the moment? It
isn't touched upon in the manual, so I'm guessing at present it's
expected to be ASCII or similar.
Cheers,
James
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Snowball-discuss
mailing list