[Xapian-discuss] using Xapian as backend for google
Felix Antonius Wilhelm Ostmann
ostmann at websuche.de
Wed Dec 13 15:06:57 GMT 2006
Olly Betts schrieb:
> On Mon, Dec 11, 2006 at 09:56:10AM +0100, Felix Antonius Wilhelm Ostmann wrote:
>
>> after one weekend i think raid is the wrong way ... split the index to
>> different drives would be faster and we dont lost the space :)
>>
>
> You don't lose the space until a disk fails - then you lose the space
> and the data that was using it.
>
> Data loss doesn't always matter though - for some applications, the
> search can be down (or missing a segment of the documents) and the
> application can still be usable.
>
> As always when building systems, you have to balance the cost of
> reliability against the probability and costs of possible failures.
>
In the next week i will play a liddle bit with xapian and the
wikipediadata. The Hardware will be:
AMD Athlon(tm) 64 Processor 3700+
2GB DDR400 RAM
4 X SAMSUNG HD300LJ - 7200 rpm - Puffer: 8 MB
I will test it with Raid1, with split index over 4 partitions (on 4
drives) and some other funky stuff ;)
I will report here :)
>
>>> If you just want two documents from any one domain, it wouldn't be hard
>>> to extend the collapse feature to leave N documents behind instead of
>>> just one.
>>>
>>> Only collapsing "similar" results is harder - first you need to decide
>>> how to define "similar" I guess.
>>>
>>
>> Hmmm ... the problem is, that one domain can include 1oo.ooo or more
>> documents. When a search match 2o.ooo documents from this domain, the
>> MatchDecider must access 2o.ooo values (with the domainname) and decline
>> 19.998 documents. And perhaps the next domain has another 1oo.ooo
>> documents with 15.ooo matches. i dont know :( is the MatchDecider the
>> right way?
>>
>
> If you want to collapse on a value but leave more than one document
> behind, I think the best approach is to enhance the collapse feature to
> allow the number of documents to keep to be specified.
>
> A search with collapsing is going to be more expensive than one without
> but I recommended trying this approach before deciding that it's to
> Cheers,
> Olly
>
>
>
MfG
Felix Antonius Wilhelm Ostmann
--
Mit freundlichen Grüßen
Felix Antonius Wilhelm Ostmann
--------------------------------------------------
Websuche Search Technology GmbH & Co. KG
Martinistraße 3 - D-49080 Osnabrück - Germany
Tel.: +49 541 40666-0 - Fax: +49 541 40666-22
Email: info at websuche.de - Website: www.websuche.de
--------------------------------------------------
AG Osnabrück - HRA 200252 - Ust-Ident: DE814737310
Komplementärin: Websuche Search Technology
Verwaltungs GmbH - AG Osnabrück - HRB 200359
Geschäftsführer: Diplom Kaufmann Martin Steinkamp
--------------------------------------------------
More information about the Xapian-discuss
mailing list