[Snowball-discuss] Encoding in browser question. StopWords.

Praveen Hombaiah ph_one at hotmail.com
Sun Jul 4 17:34:16 BST 2004


Hello,
     I'm trying to add Kannada language support to the Perl 
Lingua::StopWords module.  To do this I wanted to put together a list of 
Kannada Stop Words.  I want to do this by writing a program which will go 
thru a few kannada news papers( www.prajavani.net ), and output the list of 
most commonly used words.
    I would need the list of words in the utf-8 encoding( since that is the 
encoding that is being used by the Lingua::StopWords module ). When I go to 
the newspaper website using Internet Explorer, the characters are readable, 
and the encoding is set to Western Eurpean ( Windows ).  Is there any way to 
  convert this to utf-8 ?  Exactly what format is this currently in ?

   I aplogize for the cross-posting.  Any help anybody could provide is very 
much appreciated.


Regards,
Praveen Hombaiah.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/




More information about the Snowball-discuss mailing list