[Xapian-discuss] using Xapian as backend for google

Felix Antonius Wilhelm Ostmann ostmann at websuche.de
Wed Dec 13 15:06:57 GMT 2006


Olly Betts schrieb:
> On Mon, Dec 11, 2006 at 09:56:10AM +0100, Felix Antonius Wilhelm Ostmann wrote:
>   
>> after one weekend i think raid is the wrong way ... split the index to 
>> different drives would be faster and we dont lost the space :)
>>     
>
> You don't lose the space until a disk fails - then you lose the space
> and the data that was using it.
>
> Data loss doesn't always matter though - for some applications, the
> search can be down (or missing a segment of the documents) and the
> application can still be usable.
>
> As always when building systems, you have to balance the cost of
> reliability against the probability and costs of possible failures.
>   
In the next week i will play a liddle bit with xapian and the 
wikipediadata. The Hardware will be:

AMD Athlon(tm) 64 Processor 3700+
2GB DDR400 RAM
4 X SAMSUNG HD300LJ - 7200 rpm - Puffer: 8 MB

I will test it with Raid1, with split index over 4 partitions (on 4 
drives) and some other funky stuff ;)

I will report here :)


>   
>>> If you just want two documents from any one domain, it wouldn't be hard
>>> to extend the collapse feature to leave N documents behind instead of
>>> just one.
>>>
>>> Only collapsing "similar" results is harder - first you need to decide
>>> how to define "similar" I guess.
>>>       
>>  
>> Hmmm ... the problem is, that one domain can include 1oo.ooo or more 
>> documents. When a search match 2o.ooo documents from this domain, the 
>> MatchDecider must access 2o.ooo values (with the domainname) and decline 
>> 19.998 documents. And perhaps the next domain has another 1oo.ooo 
>> documents with 15.ooo matches. i dont know :( is the MatchDecider the 
>> right way?
>>     
>
> If you want to collapse on a value but leave more than one document
> behind, I think the best approach is to enhance the collapse feature to
> allow the number of documents to keep to be specified.
>
> A search with collapsing is going to be more expensive than one without
> but I recommended trying this approach before deciding that it's to
> Cheers,
>     Olly
>
>
>   
MfG
Felix Antonius Wilhelm Ostmann



-- 
Mit freundlichen Grüßen

Felix Antonius Wilhelm Ostmann
--------------------------------------------------
Websuche   Search   Technology   GmbH   &   Co. KG
Martinistraße 3  -  D-49080  Osnabrück  -  Germany
Tel.:   +49 541 40666-0 - Fax:    +49 541 40666-22
Email: info at websuche.de - Website: www.websuche.de
--------------------------------------------------
AG Osnabrück - HRA 200252 - Ust-Ident: DE814737310
Komplementärin:     Websuche   Search   Technology
Verwaltungs GmbH   -  AG Osnabrück  -   HRB 200359
Geschäftsführer:  Diplom Kaufmann Martin Steinkamp
--------------------------------------------------




More information about the Xapian-discuss mailing list