[Snowball-discuss] TM (trademark) removal
Grant Ingersoll
gsingers at apache.org
Mon Nov 2 13:06:37 GMT 2009
On Nov 2, 2009, at 2:03 AM, Steve Jones wrote:
> Hi,
>
> I don't think this really counts as stemming, but it seems a
> sensible place to ask my question: if a word has a TM suffix (such
> as JavaTM => javatm), is it wise or unwise to remove it? Or could it
> be argued you need to split the word and index it as two tokens,
> 'java' and 'tm'?
>
I'd argue for splitting. This would allow you to still find phrases
with Java and TM (assuming you are talking about search here). If you
were using Lucene, I'd probably actually suggest you both keep it and
split it (which can be done through manipulation of the position
information.
-Grant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tartarus.org/mailman/private/snowball-discuss/attachments/20091102/6cb8595d/attachment.htm>
More information about the Snowball-discuss
mailing list