[Xapian-discuss] indexing and queryparsing: UTF-8 and PHP
Jim Lynch
jim at fayettedigital.com
Sun Feb 26 10:08:20 GMT 2006
All I can say is it works. Take a look at
http://jim.lynch.name/cgi-bin/firelex.cgi. Do a search for
höchstpersönlichen. You'll see the word is complete and found. Neither
the indexer nor the search parser split the word. To prove that try to
find nlichen. If it were splitting at the accent, it should find that
but it doesn't.
Jim.
tata 668 wrote:
> But if the Xapian queryparser doesn't currently support UTF-8 that
> imply two possibilities
>
> 1) The indexers from the Omega project don't support UTF-8 either
> or
> 2) The Xapian queryparser and the indexers from Omega don't use the
> same algorithms to split strings into words!
>
> My problem is still present: I want to be sure the words indexed are
> separated the same way the words from the querystrings will!
>
> Therefore I guess the best solution for now if to write you own
> queryparser and your own indexer, both using the SAME algorithm to
> split words.
>
> If I take that solution the only problem remaining is to find a bullet
> proof way to split UTF-8 in PHP.
>
>
>
>
> -
More information about the Xapian-discuss
mailing list