Hi,
I'm working on tracking down, or at least clarifying the 'rm stability'
issue:
https://github.com/clearcontainers/tests/pull/491
It's a bit of a bug bear of a bug - different pattern of failure for each run etc.
One pattern I found was a timeout when talking to the VM, so I opened this:
https://github.com/containers/virtcontainers/issues/390
and modified my local timeouts to 10s (whilst I carry on debugging).
The next problem I've seen that I don't understand is getting one of these in my
runtime log:
time="2017-10-02 15:55:42.595593471 +0100 BST" pid=23999
name="cc-runtime" level="error" source="runtime"
msg="No multicast
available for CTL channel"
I had a dig, and it would look like they hyper layer has been told to close the sockets,
but then somebody has tried to use one.
I don't know that area of code well, but it smells of a race or threading/locking
issue - anybody who knows that code a little better
Have any ideas how we might have gotten to a situation where the CTL socket had been
closed, but still notionally in use?
I'll do a few more runs to see if this turns up often now I have the timeout
extensions in place - and if so, I will start to breadcrumb more debug through the
relevant paths.
Thanks,
Graham
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.