[Xapian-discuss] Rqt for Features

Richard Boulton richard at tartarus.org
Fri Jul 9 17:06:00 BST 2004


Tim Brody wrote:
> Having added wrappers for QueryParser I wonder whether it would be
> worthwhile revising Stopper. I can't think of a situation where a stopper
> would need to be more intelligent than containing a list of words to stop,
> so seems a little pointless distributing a class in Xapian that doesn't do
> this.

I think the actual process of stopping is always going to be this 
simple, but the selection of words to stop isn't necessarily so simple. 
  In particular, it would be useful to have prebuilt lists of common 
stopword for (at least) each of the languages which we provide stemmers 
for.  The user might then create, for example, a StandardStopper object, 
passing the name of a language, rather than having to keep a list of 
words in their application.

However, there's a strong argument for providing a class such as yours 
as part of Xapian, since it would be useful to many users.  Could you 
add this to the bugzilla too, so it won't get forgotten?

> Of course if I could wave a magic wand I would modify QueryParser's API
> anyway .... :-)

QueryParser is a great deal less polished than other parts of Xapian's 
interface - which is partly why it is separated out into a separate 
library.  It was originally written for a specific application (omega), 
and then extracted into a separate library, but it is due for a good 
look.  In other words - its API is open for discussion.

Certainly, it is weird to have "set_stemming_options()" take a stopper: 
I'd like to see that fixed.  It also has a load of public members which 
really should be private...

Additionally, I'd like to see some code for indexing a chunk of text in 
a manner compatible with the query parser put into a library. 
Currently, the easiest approach for application writers is to cut and 
paste blocks of code from omindex...

Patches for any of these things would be most welcome - but discussion 
and other suggestions are also appreciated.

-- 
Richard



More information about the Xapian-discuss mailing list