[Snowball-discuss] Optimising among
Olly Betts
olly at survex.com
Mon Sep 18 18:47:36 BST 2006
On Mon, Sep 18, 2006 at 06:15:22PM +0200, Martin Porter wrote:
> For string-forward among, surely the byte to take is not byte 0, but byte
> n-1, where n is the size of the smallest string in the among.
Are you saying it's currently incorrect?
Or that taking this byte may give a better optimisation, because it
avoids the problem with Cyrillic characters always starting with one of
two bytes in UTF-8?
Assuming the later, since we know the cases when we generate the
shortcut, we could actually look at all the different choices of
bytes between 0 and n-1 and potentially chose a different strategy
for each among.
Cheers,
Olly
More information about the Snowball-discuss
mailing list