[Xapian-discuss] Rqt for Features

Tim Brody tdb01r at ecs.soton.ac.uk
Mon Jul 12 18:14:57 BST 2004


----- Original Message ----- 
From: "Richard Boulton" <richard at tartarus.org>

> > Of course if I could wave a magic wand I would modify QueryParser's API
> > anyway .... :-)
>
> Certainly, it is weird to have "set_stemming_options()" take a stopper:
> I'd like to see that fixed.  It also has a load of public members which
> really should be private...
>
> Additionally, I'd like to see some code for indexing a chunk of text in
> a manner compatible with the query parser put into a library.
> Currently, the easiest approach for application writers is to cut and
> paste blocks of code from omindex...
>
> Patches for any of these things would be most welcome - but discussion
> and other suggestions are also appreciated.

Here's a completely untested (but probably compiles) patch for the header:
http://santos.ecs.soton.ac.uk/queryparser.h.patch

If I can get bison up to date I will test it, but I suspect there's more
complex revision to do than I know how to.

I *guess* that Stem should be called Stemmer to be consistent (e.g. Indexer,
MSetIterator etc.)

This causes a segfault (due to the destroy stemmer in QP's destructor):
QueryParser qp();
Stem stem("english");
qp.stemmer = &stem;
Is it preferable to pass a language string or object to QP, my naive opinion
is objects should be passed?

Should stop terms be applied before or after stemming (it's currently
before?)?

Is there a central configuration for languages, i.e. somewhere closer to
Stem that stopwords could be placed so that those adding language support
don't need to change multiple header files?

I would be happy to do some documenting too (having had to read the C++ code
to understand what prefixes was for ...), but I suspect QP could use some
root-canal work :-)

All the best,
Tim.




More information about the Xapian-discuss mailing list