[Xapian-discuss] Adding terms of more than one word with PHP bindings

Yannick Warnier ywarnier at beeznest.org
Sun Sep 14 16:00:12 BST 2008


Le samedi 13 septembre 2008 à 09:38 +0100, James Aylett a écrit :
> On Fri, Sep 12, 2008 at 08:57:27PM -0500, Yannick Warnier wrote:
> 
> > One of my employees working on a script with the PHP bindings for Xapian
> > (1.0.5) is having a hard time finding how to add terms of more than one
> > word to a Xapian document. So far we have the right procedure to add
> > one-word terms, but somehow adding them with the:
> >   term: firstword secondword term: secondterm
> > doesn't seem to work.
> 
> I don't entirely understand what you mean by the last bit. Most of the
> time you don't actually need multi-word terms - what's your use case?

My use case is that I offer a set of scripts whereby the users can add
"tags" to the documents they index. These tags are then kept in the
Xapian database using the terms feature.
Some of these tags are using multiple words (let's say "summer
holiday"). I then offer a search interface which allow for a search
based on a combination of tags (boolean search) and normal (statistical)
search.
The tags are stored correctly using the XapianDocument::add_term()
method. They are retrieved correctly using the
XapianDatabase::allterms_begin() method.

However, when trying to query the Xapian database for my search string
(see code appended below), the search string syntax (the first parameter
of my xapian_query function), something like 

  sea sex sun T:summer holiday T:beach

doesn't get the tag "summer holiday".

Any idea of what I am doing wrong?

Yannick


Code excerpt:

/**
 * Queries the database.  
 * The xapian_query function queries the database using both a query
string
 * and application-defined terms. Based on drupal-xapian
 * 
 * @param   string          $query_string   The search string. This
string will
 *                                          be parsed and stemmed
automatically.
 * @param   XapianDatabase  $db             Xapian database to connect
 * @param   int             $start          An integer defining the
first 
 *                                          document to return
 * @param   int             $length         The number of results to
return.
 * @param   array           $extra          An array containing arrays
of 
 *                                          extra terms to search for.
 * @param   int             $count_type     Number of items to retrieve
 * @return  array                           An array of nids
corresponding to the results.
 */
function xapian_query($query_string, $db = NULL, $start = 0, $length =
10, 
  $extra = array(), $count_type = 0) {
        
    try {
        if (!is_object($db)) {
            $db = new XapianDatabase(XAPIAN_DB);
        }

        $enquire = new XapianEnquire($db);
        $query_parser = new XapianQueryParser();
        $stemmer = new XapianStem("english");
       // $query_parser->feature_flag(FLAG_SPELLING_CORRECTION);
       // $query_parser-> debug = true;
       // $query_parser-> set_default_op(XapianQuery::OP_OR);
        
        $query_parser->set_stemmer($stemmer);
        $query_parser->set_database($db);

$query_parser->set_stemming_strategy(XapianQueryParser::STEM_SOME);
        $query_parser->add_boolean_prefix('filetype', 'F');
        $query_parser->add_boolean_prefix('tag', 'T');
        $query_parser->add_boolean_prefix('courseid', 'C');
        $query = $query_parser->parse_query($query_string);
        //var_dump($query_parser->get_description());
        //var_dump($query_parser->get_corrected_query_string());
		//print_r($query); //exit;
        // Build subqueries from $extra array.
        foreach ($extra as $subq) {
          if (!empty($subq)) {
            /* TODO: review if we want to use this constructor
             * deprecated in C:
http://xapian.org/docs/apidoc/html/classXapian_1_1Query.html#f85d155b99f1f2007fe75ffc7a8bd51e
             * maybe use: Query (Query::op op_, const Query &left, const
Query &right) ?
             */
            $subquery = new XapianQuery(XapianQuery::OP_OR, $subq);
            $query = new XapianQuery(XapianQuery::OP_AND,
array($subquery, $query));
          }
        }

        $enquire->set_query($query);
        $matches = $enquire->get_mset((int)$start, (int)$length);

        $results = array();
        $i = $matches->begin();
        $count = 0;
        while (!$i->equals($matches->end())) {
          $count++;
          $document = $i->get_document();
          if (is_object($document)) {
            $results[$count]->ids = ($document->get_data());
            $results[$count]->score = ($i->get_percent());
            $results[$count]->terms = xapian_get_doc_terms($document);
          }
          $i->next();
        }

        switch ($count_type) {
          case 1: // Lower bound
            $count = $matches->get_matches_lower_bound();
            break;
            
          case 2: // Upper bound
            $count = $matches->get_matches_upper_bound();
            break;

          case 0: // Best estimate
          default:
            $count = $matches->get_matches_estimated();
            break;
        }

        return array($count, $results);
    }
    catch (Exception $e) {
      Display::display_error_message('xapian error message: '.
$e->getMessage());
        return NULL;
    }
}




More information about the Xapian-discuss mailing list