[Snowball-discuss] Encoding in browser question. StopWords.
Praveen Hombaiah
ph_one at hotmail.com
Sun Jul 4 17:34:16 BST 2004
Hello,
I'm trying to add Kannada language support to the Perl
Lingua::StopWords module. To do this I wanted to put together a list of
Kannada Stop Words. I want to do this by writing a program which will go
thru a few kannada news papers( www.prajavani.net ), and output the list of
most commonly used words.
I would need the list of words in the utf-8 encoding( since that is the
encoding that is being used by the Lingua::StopWords module ). When I go to
the newspaper website using Internet Explorer, the characters are readable,
and the encoding is set to Western Eurpean ( Windows ). Is there any way to
convert this to utf-8 ? Exactly what format is this currently in ?
I aplogize for the cross-posting. Any help anybody could provide is very
much appreciated.
Regards,
Praveen Hombaiah.
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
More information about the Snowball-discuss
mailing list