[Snowball-discuss] TM (trademark) removal

Grant Ingersoll gsingers at apache.org
Mon Nov 2 13:06:37 GMT 2009


On Nov 2, 2009, at 2:03 AM, Steve Jones wrote:

> Hi,
>
> I don't think this really counts as stemming, but it seems a  
> sensible place to ask my question: if a word has a TM suffix (such  
> as JavaTM => javatm), is it wise or unwise to remove it? Or could it  
> be argued you need to split the word and index it as two tokens,  
> 'java' and 'tm'?
>

I'd argue for splitting.  This would allow you to still find phrases  
with Java and TM (assuming you are talking about search here).  If you  
were using Lucene, I'd probably actually suggest you both keep it and  
split it (which can be done through manipulation of the position  
information.

-Grant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20091102/6cb8595d/attachment.htm>


More information about the Snowball-discuss mailing list