Updated: 9/22/2014
This document covers the intersection of pod states, the PodStatus type, the life-cycle of a pod, events, restart policies, and replication controllers. It is not an exhaustive document, but an introduction to the topics.
While PodStatus
represents the state of a pod, it is not intended to form a state machine. PodStatus
is an observation of the current state of a pod. As such, we discourage people from thinking about "transitions" or "changes" or "future states".
Since PodStatus
is not a state machine, there are no edges which can be considered the "reason" for the current state. Reasons can be determined by examining the events for the pod. Events that affect containers, e.g. OOM, are reported as pod events.
TODO(@lavalamp) Event design
The only controller we have today is ReplicationController
. ReplicationController
is only appropriate for pods with RestartPolicy = Always
. ReplicationController
should refuse to instantiate any pod that has a different restart policy.
There is a legitimate need for a controller which keeps pods with other policies alive. Both of the other policies (OnFailure
and Never
) eventually terminate, at which point the controller should stop recreating them. Because of this fundamental distinction, let's hypothesize a new controller, called JobController
for the sake of this document, which can implement this policy.
Containers can terminate with one of two statuses:
TODO(@dchen1107) Define ContainerStatus like PodStatus
The number and meanings of PodStatus
values are tightly guarded. Other than what is documented here, nothing should be assumed about pods with a given PodStatus
.
The pod has been accepted by the system, but one or more of the containers has not been started. This includes time before being schedule as well as time spent downloading images over the network, which could take a while.
The pod has been bound to a node, and all of the containers have been started. At least one container is still running (or is in the process of restarting).
All containers in the pod have terminated in success.
All containers in the pod have terminated, at least one container has terminated in failure.
In general, pods which are created do not disappear until someone destroys them. This might be a human or a ReplicationController
. The only exception to this rule is that pods with a PodStatus
of succeeded
or failed
for more than some duration (determined by the master) will expire and be automatically reaped.
If a node dies or is disconnected from the rest of the cluster, some entity within the system (call it the NodeController for now) is responsible for applying policy (e.g. a timeout) and marking any pods on the lost node as failed
.
Pod is running
, 1 container, container exits success
running
succeeded
succeeded
Pod is running
, 1 container, container exits failure
running
running
failed
Pod is running
, 2 containers, container 1 exits failure
running
running
running
running
running
failed
Pod is running
, container becomes OOM
running
running
failed
Pod is running
, a disk dies
failed
Pod is running
, its node is segmented out
failed
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。