[Xapian-discuss] scriptindex on an internet crawl
Arjen van der Meijden
acmmailing at tweakers.net
Thu Jun 23 07:14:05 BST 2005
On 23-6-2005 0:34, Olly Betts wrote:
> On Wed, Jun 22, 2005 at 09:14:12PM +0100, Olly Betts wrote:
>
>>On Wed, Jun 22, 2005 at 03:21:32PM -0400, Georges Dupret wrote:
>>
>>>In a first try, I inserted in the command file url : field=url boolean=XURL
>>>unique=XURL and in the input file: url=www.dcc.uchile.cl/~gdupret for
>>>example, but scriptindex start using 100% of the CPU and never finishes.
>>
>>You probably don't want to specify both uid and url as unique fields,
>>but this should cause a hang - I'll see if I can reproduce this.
>
>
> I can't seem to reproduce this. Can you run scriptindex under gdb (just
> add "gdb --args " in front of the scriptindex command, then "run" at the
> "(gdb)" prompt), and hit Ctrl-C when it's hung. Then "bt" should show a
> backtrace of where execution is.
Can't this be explained by just that scriptindex is very very slow? I
can imagine that a unique-check for a relatively long identifier with a
relatively similar beginning can be very time consuming and/or results
in quite a bit of more btree-work. At least compared to more evenly
distributed identifiers.
Best regards,
Arjen
More information about the Xapian-discuss
mailing list