[Xapian-discuss] Search queries with wildcards

James Aylett james-xapian at tartarus.org
Wed Dec 15 09:50:29 GMT 2004


On Wed, Dec 15, 2004 at 08:01:56AM +0100, Timo Haberkern wrote:

> A wild card search would be very great. In germany we have a lot of 
> compound words. A pure stemmer base search didn't find a lot of matches. 
> Think of the word "Fehlercode", if i use "Fehler" as a search query i 
> wouldn't find the documents with Fehlercode in it, right? But i need 
> such a solution. And wildcards seems to be the only solution.

A thought: this is perhaps impractical because of dictionary sourcing
issues (and management, too, come to think of it), but you could look
for compound prefixes while turning longer words into terms, and split
(and stem) them based on possible compound constructions. So on
indexing "Fehlercode" you first detect that "Fehler" is an acceptable
fragment within a compound word, and store s("Fehler"), s("code"),
s("Fehlercode") (where s() stems).

This would reduce your index size over more flexible wildcards, where
I think you'd have to store all possible substrings you want to search
for. If you wanted to be able to search for "code" and have it match
"Fehlercode", you'd end up generating lost of substrings (erm ... n^2
- (n-1)! or something?). Although perhaps easier to index; also, it
may make constructing effective searches easier (because you just use
the words from the query straight off).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list