<div>Olly,</div>
<div> </div>
<div>Yes of course I am sanitizing HTML. Actually HTML is striped at beginning of the process and only text is placed into database, it makes database lot small too. </div>
<div> </div>
<div>I would like to ask you a question. When I try to search for "hiking" </div>
<div><a href="http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hiking&c=sk">http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hiking&c=sk</a> </div>
<div>I am not getting any results back. </div>
<div> </div>
<div>However if I search for hike only I get result back even the one including hiking that is display on the top of the search. </div>
<div><a href="http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hike&c=sk">http://nitra.net/cgi-bin/hladaj.cgi?a=q&q=hike&c=sk</a></div>
<div> </div>
<div>Is it possible during index time to have every words index and disable the stemming algorithm or that is is part of the package whether we like or not?</div>
<div> </div>
<div>Thanks,</div>
<div>Kevin</div>
<div><a href="http://nitra.net">http://nitra.net</a></div>
<div><br> </div>
<div><span class="gmail_quote">On 3/8/06, <b class="gmail_sendername">Olly Betts</b> <<a href="mailto:olly@survex.com">olly@survex.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Wed, Mar 08, 2006 at 09:43:53AM -0800, Kevin SoftDev wrote:<br>> my $total = $db->get_termfreq($terms);
<br><br>This looks up the frequency of a single term, so it'll be fine for a one<br>term query, but will return zero for anything more complicated (unless<br>you happen to have terms with spaces, etc in).<br><br>As I explained just now, you want MSet::get_matches_estimated().
<br><br>> $html = $doc->get_data();<br>><br>> $html =~ m/body=(.*)/; $body = $1;<br><br>That's kind of risky - you only want to match body at the start of a<br>line, but this doesn't specify that, so it'll match wrongly if there's
<br>an earlier line containing "body=" anywhere in it. I suggest:<br><br> my ($body) = $html =~ m/^body=(.*)/m;<br><br>> print "<a href=\"$url\"<br>> target=_blank><b>$title</b><BR><i>$url</i></a><BR>$body";
<br><br>You really want to be escaping values put into HTML output, unless<br>you've carefully sanitised them at indexing time. Otherwise you're<br>opening yourself to cross-site scripting type exploits.<br><br>Cheers,<br>
Olly<br></blockquote></div><br>