[Xapian-discuss] How to really make use of omega/xapian? (for omega with PHP Mysql)

Olly Betts olly at survex.com
Thu Oct 5 20:53:51 BST 2006


On Thu, Oct 05, 2006 at 02:44:54AM -0700, ath wrote:
>Olly Betts wrote:
>>Do you actually want to store the whole field in Xapian?  You can, but
>>it's not required in order to index it, and it's potentially large...
> 
> You mean even though I don't store teh complete field (truncated), the
> index will take the complete into consideration? 

The index terms are generated by the "index" command (and also
"boolean").  The text stored is done by "field".  There's no direct
connection between the two.

> I've tried this index-setting
> first-post: unhtml weight=3 index  truncate=10  field=sample
> But I'm not getting any results when I search for words after the 10th
> word.

The parameter passed to truncate is measured in characters, not words.

I don't know why you're seeing what you are.  You can look at the
database with the delve tool (examples/delve in xapian-core) to see
which terms index a particular document, and which documents are indexed
by a particular term.

>>> In the end I want to be able to search on topictitle, author and forum.
>>> Is
>> >this indexscript suitable for that?
> 
>>It looks plausible, though I don't know exactly what's in each field.
> 
> in topictitle is the topictitle. A long string.
> Author and forum are integers. 

Then you don't want to index author and forum as text - you want them
as boolean terms for filtering (so use boolean=A instead of index=A).

>>See the Omega documentation "docs/termprefixes.txt", in particular the
>>last section on "Probabilistic Fields".
>>
>>If you want separate form fields for "author" and "body" queries, you
>>can't quite achieve this using Omega unmodified at present.  That really
>>should be possible - file a wishlist bug and I'll take a look when I'm
>>not in the middle of a release.  
> 
> I'll file a wishlist then. But I've seen websites using XO that allows
> searching for authors and groups. Does that mean that they have indexes
> made per author, group AND topics?

Are those sites using an unmodified version of Omega?  Gmane uses a
modified version, for example.

You can support searching for prefixed terms in the query string (e.g.
`pendulum author:poe'), which could be what you've seen.  The
documentation I referred you to above describes how to set that up.

>>Or if you want to work on a patch, I
>>can point you in the right direction.
> 
> I'd like to to try and modify it too, if you'd be so friendly to point the
> direction :)
 
I'll need to take a look at the code first.  I'll get back to you on
that.

> but teh xml breaks at this phrase: "classical drama ??K{q? Reporters "
> (naturally, its the weird characters here ??K{q?  that are causing the
> problem)

Currently we only escape <, >, &, and " (I think " was added since
0.9.6).  If you need more characters escaping, you'll need to modify
html_escape() in query.cc.

Cheers,
    Olly



More information about the Xapian-discuss mailing list