[Xapian-discuss] How to really make use of omega/xapian? (for omega with PHP Mysql)

ath athlonkmf at yahoo.com
Thu Oct 5 10:44:54 BST 2006




Olly Betts wrote:
> 
> On Wed, Oct 04, 2006 at 01:35:09PM -0700, ath wrote:
>>> first-post: unhtml  truncate=300 field=sample
>>> first-post: unhtml weight=3 index field=body
> 
>>It would be more efficient to only "unhtml" once:
> 
>>first-post: unhtml weight=3 index field=body truncate=300 field=sample
>>
>>Do you actually want to store the whole field in Xapian?  You can, but
>>it's not required in order to index it, and it's potentially large...
>>
> 
> You mean even though I don't store teh complete field (truncated), the
> index will take the complete into consideration? 
> I've tried this index-setting
> first-post: unhtml weight=3 index  truncate=10  field=sample
> But I'm not getting any results when I search for words after the 10th
> word.
> 
>>> In the end I want to be able to search on topictitle, author and forum.
>>> Is
>> >this indexscript suitable for that?
> 
>>It looks plausible, though I don't know exactly what's in each field.
> 
> in topictitle is the topictitle. A long string.
> Author and forum are integers. 
> The sql-query would be something like this then: select * from topics
> where author=1 and forum=2 and topictitle like (something%)
> 
>>> 2) How can I search on the indexes with the given indexscheme?
>>> 
>>> If I, lessay I want to search for topics started by a certain author
>>> (testwriter), I'd assume I only have to do such a search in omega.
>>> omega?A=testwriter&DEFAULTOP=or&DB=default&FMT=query&xDB=default&xFILTERS=--O
>>> However, I'm not getting any results back if I do so.
>>
>>Term prefixes aren't the same as CGI parameters.
>>
>>See the Omega documentation "docs/termprefixes.txt", in particular the
>>last section on "Probabilistic Fields".
>>
>>If you want separate form fields for "author" and "body" queries, you
>>can't quite achieve this using Omega unmodified at present.  That really
>>should be possible - file a wishlist bug and I'll take a look when I'm
>>not in the middle of a release.  
> 
> I'll file a wishlist then. But I've seen websites using XO that allows
> searching for authors and groups. Does that mean that they have indexes
> made per author, group AND topics?
> 
>>Or if you want to work on a patch, I
>>can point you in the right direction.
>>
> 
> I'd like to to try and modify it too, if you'd be so friendly to point the
> direction :)
> 
>>> 3) How can I safely integrate omega on my site?
>>> 
>>> I have a grouppermission scheme going on on my site. You need to be in a
>>> certain group to search for content in certain forums. 
>>> I found this post
>>> http://thread.gmane.org/gmane.comp.search.xapian.devel/112/focus=113 but
>>> it
>>> didn't really help. How can I put these (<QUERY>) AND (XWORLD:yes OR
>>> XUSER:bill OR XGROUP:users OR XGROUP:wheel) into use with omega.
>>> Where do I put the XWORLD, XUSER, XGROUP-things in the index?
>>> And doesn't that mean that a user only have to out XGROUP:wheel in the
>>> query
>>> and still gets to see evertying?
>>
>>You'll need to modify Omega for this.
>>
>>The query string is passed to the Xapian::QueryParser object which
>>returns a Xapian::Query object (function set_probabilistic in query.cc).
>>You then just need to combine this with your permissions filter,
>>something like this:
>>
>>	Xapian::Query permissions("XGROUP:squirrels"); // Or whatever
>>        query = qp.parse_query(query_string);
>>	query = Xapian::Query(Xapian::Query::OP_FILTER, query, permissions);
>>
>>> 4) How can i make sure that illegal characters are filtered out in
>>> omega?
>>> I sometimes have multilingual characters in the content, and these has
>>> caused the xml-output of omega to go haywire. How can I make sure that
>>> these
>>> kind of characters are filtered out? I've already used unhtml, what else
>>> can
>>> I do?
>>
>>Bear in mind that the released versions of Omega assume iso-8859-1
>>(utf-8 support will be in the 1.0 release) so wide and multibyte
>>characters won't currently be handled correctly.
>>
>>Using $html{} in your query template should escape characters which
>>are problematic in HTML and XML.  If you're not using that, you really
>>need to so as to avoid potential cross-site scripting type attacks.
>>
>>If you're already using that, can you give an example of this problem?
> 
> I'm using this code already in the xcml-template
> score="$html{$score}"
> sample="$html{$htmlstrip{$field{sample}}}"
> 
> but teh xml breaks at this phrase: "classical drama ??K{q? Reporters "
> (naturally, its the weird characters here ??K{q?  that are causing the
> problem)
> 
> 
> 
>>Cheers,
>>    Olly
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-really-make-use-of-omega-xapian--%28for-omega-with-PHP-Mysql%29-tf2384872.html#a6655643
Sent from the Xapian - Discuss mailing list archive at Nabble.com.




More information about the Xapian-discuss mailing list