[Snowball-discuss] a simple algorithm problem

James Aylett james at tartarus.org
Thu Jan 6 09:29:07 GMT 2005


On Thu, Jan 06, 2005 at 09:12:34AM +0000, Martin Porter wrote:

> So one idea is to declare 'utf8' in the Snowball script, allowing character
> defs in the range 0-64K, as in the 2-byte character version. Characters
> could be written with their Unicode values.

Presumably this still restricts Snowball to code points in the BMP? Or
does it just restrict it to recognising and doing things with
characters at code points in the BMP, passing through any others?
There's not a huge amount outside it yet, so this may not matter at
all.

> and encoded in utf-8 form in strings.

What's the character encoding of snowball scripts at the moment? It
isn't touched upon in the manual, so I'm guessing at present it's
expected to be ASCII or similar.

Cheers,
James

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Snowball-discuss mailing list