[Snowball-discuss] Unicode support

James Aylett james-xapian@tartarus.org
Mon, 20 May 2002 14:05:13 +0100

On Mon, May 20, 2002 at 06:29:35AM -0600, Martin Porter wrote:

> BOM is 'byte order mark' - some special character of termination?
> The answer then is no.

It's actually usually at the begining. 0xffef or its equivalent
byte-swapped, I think. Snowball doesn't want to do this, because it's
somewhat too much overhead for each individual stemming call, I'd have 


  James Aylett                                            zap.tartarus.org
  james@tartarus.org                                        footlights.org

Hundreds of nodes, one monster rendering program.
Now that's a super model! Visit http://clustering.foundries.sf.net/

Snowball-discuss mailing list