<div>Olly,</div>

<div>&nbsp;</div>

<div>Yes of course I am sanitizing HTML. Actually HTML is striped at beginning of the process and only text is placed into database, it makes database lot small too. </div>

<div>&nbsp;</div>

<div>I would like to ask you a question. When I try to search for &quot;hiking&quot; </div>

<div><a href="http://nitra.net/cgi-bin/hladaj.cgi?a=q&amp;q=hiking&amp;c=sk">http://nitra.net/cgi-bin/hladaj.cgi?a=q&amp;q=hiking&amp;c=sk</a>&nbsp;</div>

<div>I am not getting any results back. </div>

<div>&nbsp;</div>

<div>However if I search for hike only I get result back even the one including hiking that is display on the top of the search. </div>

<div><a href="http://nitra.net/cgi-bin/hladaj.cgi?a=q&amp;q=hike&amp;c=sk">http://nitra.net/cgi-bin/hladaj.cgi?a=q&amp;q=hike&amp;c=sk</a></div>

<div>&nbsp;</div>

<div>Is it possible&nbsp;during index time to have every words index and&nbsp;disable the&nbsp;stemming algorithm&nbsp;or that is is part of the package whether we like or not?</div>

<div>&nbsp;</div>

<div>Thanks,</div>

<div>Kevin</div>

<div><a href="http://nitra.net">http://nitra.net</a></div>

<div><br>&nbsp;</div>

<div><span class="gmail_quote">On 3/8/06, <b class="gmail_sendername">Olly Betts</b> &lt;<a href="mailto:olly@survex.com">olly@survex.com</a>&gt; wrote:</span>

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Wed, Mar 08, 2006 at 09:43:53AM -0800, Kevin SoftDev wrote:<br>&gt;&nbsp;&nbsp; my $total&nbsp;&nbsp;= $db-&gt;get_termfreq($terms);

This looks up the frequency of a single term, so it'll be fine for a one term query, but will return zero for anything more complicated (unless you happen to have terms with spaces, etc in). As I explained just now, you want MSet::get_matches_estimated().

<br><br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; $html = $doc-&gt;get_data();<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; $html&nbsp;&nbsp;&nbsp;&nbsp;=~ m/body=(.*)/;&nbsp;&nbsp; $body&nbsp;&nbsp;= $1;<br><br>That's kind of risky - you only want to match body at the start of a<br>line, but this doesn't specify that, so it'll match wrongly if there's

<br>an earlier line containing &quot;body=&quot; anywhere in it.&nbsp;&nbsp;I suggest:<br><br>&nbsp;&nbsp;&nbsp;&nbsp; my ($body) = $html =~ m/^body=(.*)/m;<br><br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; print &quot;&lt;a href=\&quot;$url\&quot;<br>&gt; target=_blank&gt;&lt;b&gt;$title&lt;/b&gt;&lt;BR&gt;&lt;i&gt;$url&lt;/i&gt;&lt;/a&gt;&lt;BR&gt;$body&quot;;

<br><br>You really want to be escaping values put into HTML output, unless<br>you've carefully sanitised them at indexing time.&nbsp;&nbsp;Otherwise you're<br>opening yourself to cross-site scripting type exploits.<br><br>Cheers,<br>

&nbsp;&nbsp; Olly<br></blockquote></div><br>