[Snowball-discuss] 16 bit characters in Snowball
Martin Porter
martin_porter@SoftHome.net
Thu, 23 May 2002 04:18:00 -0600
/*
Andreas,
Sorry to have been an age in replying. Your little test program needs
casts and a bit more care with the counting. The following example program
should make everything clear.
Assume andreas/ contains
api.c api.h driver.c header.h stem.c stem.h utilities.c
where stem.c is the standard Porter stemmer, and you compile with
x gcc -o andreas/test andreas/*.c
Martin
----------------------------------------------------------------------
*/
#include <stdio.h>
#include "api.h"
#include "stem.h"
static report(int length, char * b) {
int i;
for (i = 0; i < length; i++) printf(b[i] == 0 ? "[0] " : "%c ", b[i]);
printf("\n");
}
main() {
char b[16] = {'s',0, 'p',0, 'l',0, 'i',0,
'c',0, 'i',0, 'n',0, 'g',0 };
/* There is obvious machine dependency here: 2 chars per short and
littleender order in the byte pairs making up the 16 bit characters
*/
struct SN_env * z = create_env();
/* 'symbol' is typedeffed to 'unsigned short'. There are 8 == 16/2
symbols in b
*/
SN_set_current(z, 16/2, (symbol *) b);
printf("Standard Porter stemmer:\n");
report(16, b);
printf("stems to\n");
stem(z);
/* z->l measures symbols so needs doubling in the next call. z->p
is a symbol * so needs casting
*/
report(2 * z->l, (char *) z->p);
/* All this gives:
Standard Porter stemmer:
s [0] p [0] l [0] i [0] c [0] i [0] n [0] g [0]
stems to
s [0] p [0] l [0] i [0] c [0] e [0]
*/
close_env(z);
return 0;
}
_______________________________________________________________
Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/snowball-discuss