[Snowball-discuss] New, and a couple of questions

Martin Porter martin.porter@grapeshot.co.uk
Wed Mar 10 09:52:02 2004


Your assumptions about memory allocation are correct. To change the initial
creation size from 1 to 1000 (say) you alter

#define CREATE_SIZE 1


#define CREATE_SIZE 1000

in the q/utilities.c module. But you must not imagine size changes happen
often. The buffers are incresed in increase_size(...) in q/utilities.c, and
this is never called more than twice for any of the sample vocabularies I
use in the tests. 

There is no fast way of discovering if a word has been stemmed. You could
set a flag in the various functions of q/utilities.c that alter z->p, but
this is not a general solution, since Snowball can use auxiliary strings
that may be altered while the main string remains unaltered - although none
of the current stemmers would do that. So you have to use strcmp or equivalent. 

I find on my machine,

    for (i = 0; i < one_hundred_million; i++) 

takes about 24 secs, of which 1 sec is spent in the mechanics of the loop. I
suppose the words are tested from the beginning, and

    for (i = 0; i < one_hundred_million; i++) 

by contrast takes about 4 secs. But my machine is fairly slow: on Richard
Boulton's machine it would take a tenth of that time.
