[Xapian-discuss] Text snippets
Jim
jim at fayettedigital.com
Sat Dec 26 17:02:15 GMT 2009
karpet at localhost.com wrote:
>> On Thu, Dec 17, 2009 at 11:29:52AM +0300, Do. wrote:
>>
>
>
>
>> There's a ticket in trac as well as the FAQ entry. The FAQ entry had some
>> rough edges (e.g. the sample thread it linked to wasn't about snippets at
>> all)
>> so I've overhauled it, and linked to the ticket as part of that:
>>
>> http://trac.xapian.org/wiki/FAQ/Snippets
>>
>>
>
> FWIW, the Search::Tools modules mentioned in that FAQ entry have gotten a
> lot of work in the last six months, and many of the slow parts moved to
> C/XS. The FAQ entry mentions problems with phrases and stemming, and to
> the best of my knowledge those have been resolved.
>
> I use Search::Tools with Xapian quite successfully. I store the entire
> plain (no HTML) text of each document in the 'data' entry for each
> document, and can snip and highlight very easily with Search::Tools +
> Search::Xapian. If I want to highlight terms in the original document, I
> use HTML::HiLiter.
>
> http://search.cpan.org/dist/Search-Tools/
>
> I would be happy to change the FAQ entry to reflect the above, but of
> course as the author of Search::Tools I am biased, so if you find that
> Search::Tools doesn't work well with Xapian I'd like to hear about it.
>
> pek
>
>
I also use Search::Tools with good success with Xapian. They seem to
work well. However I do not store my data in the Xapian database since
the data is already on disk in html format (3+ Gb) and to generate
samples for inclusion in the results page I simply run html2text. The
speed is satisfactory, so I didn't find a need for keeping duplicate
data. Another advantage is that indexing is faster when I don't store
the data in the database.
I only mention this to provide an alternate solution for future
reference to anyone searching for ways to solve a problem.
Jim.
More information about the Xapian-discuss
mailing list