[Snowball-discuss] Mobile phone implementation of the English Stemmer
Alexandra Elizabeth Duncan
aed02@doc.ic.ac.uk
Thu Aug 28 17:55:01 2003
Hi
I wrote a couple of emails to this mailing list back in July - I am an
MSc student studying Computer Science at Imperial College, London. I
have just about completed my thesis/project which has been concerned
with writing a mobile phone translator (english to french and french to
english). Please excuse the length of this email but I thought you
might be interested in the work I have done using the Porter algorithm.
Very briefly, there is a small dictionary of words stored as part of the
application on the mobile phone. A user inputs a word to be translated
and the application returns the translation if the word is found in the
phone dictionary. If the word is not in the dictionary, the application
queries a remote dictionary and returns the translation.
Given the constrained system requirements of mobile phones, I have had
to work at compressing the words to be stored on the phone. For this I
used the Porter algorithm and the Java implementation from the website.
The words that make up the dictionary are stemmed and stored on the
phone. When the user inputs a word, that word is then stemmed (using
the Java implementation modified slightly for the mobile phone) and then
matched against the stemmed words in the dictionary.
By doing this, I was able to get about 25% compression on the english
words I had.
I only implement the stemming for the english words and therefore only
the english words are compressed. I did try to implement the french
stemmer but I found it was too large for the mobile phone and more
complicated.
I would like to say thank you for the excellent and informative website
- it has been of great use to me in the past 3 months.
I was also wondering if you know of anyone who has implemented the
stemmer on a mobile phone. If not, this would lend my project a bit of
extra kudos, I have to say!
I will be finalising the code and writing the actual thesis in the next
2 weeks. If anyone is interested in the work that I have done on it,
please let me know as I would be more than happy to supply the code
and/or the report.
Thank you once again
Alex Duncan