[Xapian-discuss] making my db leaner and meaner

Ben Campbell ben at scumways.com
Tue Mar 31 10:58:44 BST 2009


Olly Betts wrote:
> On Thu, Mar 26, 2009 at 04:30:09PM +0000, Ben Campbell wrote:
> It's worth taking a look at the terms indexed for each document (the
> delve tool in xapian-core/examples is good for this) and seeing if
> you can get rid of any junk.  It depends on the nature of the data,
> but things like ASCII art, OCRed documents, files with the wrong
> extensions, etc can result in terms which aren't useful for searches.

Ahh good point - there is probably a lot of cruft in there.

Is it actually possible to block terms entirely when using 
TermGenerator::index_text()?
TermGenerator seems to add even stopped terms, albeit only in their 
non-stemmed form.

Thanks,
Ben.



More information about the Xapian-discuss mailing list