it-swarm.com.de

coredns-Pods haben den Status CrashLoopBackOff oder Error

Ich versuche, den Kubernetes-Master einzurichten, indem er Folgendes ausgibt:

kubeadm init --pod-network-cidr = 192.168.0.0/16

  1. gefolgt von: Installieren eines Pod-Netzwerk-Add-Ons (Calico)
  2. gefolgt von: Master Isolation

problem: coredns-Pods haben CrashLoopBackOff oder Error-Status:

# kubectl get pods -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-node-lflwx                          2/2     Running            0          2d
coredns-576cbf47c7-nm7gc                   0/1     CrashLoopBackOff   69         2d
coredns-576cbf47c7-nwcnx                   0/1     CrashLoopBackOff   69         2d
etcd-suey.nknwn.local                      1/1     Running            0          2d
kube-apiserver-suey.nknwn.local            1/1     Running            0          2d
kube-controller-manager-suey.nknwn.local   1/1     Running            0          2d
kube-proxy-xkgdr                           1/1     Running            0          2d
kube-scheduler-suey.nknwn.local            1/1     Running            0          2d
# 

Ich habe es mit Troubleshooting kubeadm - Kubernetes versucht, jedoch läuft mein Knoten nicht SELinux und mein Docker ist auf dem neuesten Stand.

# docker --version
Docker version 18.06.1-ce, build e68fc7a
# 

kubectls describe:

# kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx 
Name:               coredns-576cbf47c7-nwcnx
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               suey.nknwn.local/192.168.86.81
Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        cni.projectcalico.org/podIP: 192.168.0.30/32
Status:             Running
IP:                 192.168.0.30
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:  docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:      docker-pullable://k8s.gcr.io/[email protected]:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 31 Oct 2018 23:28:58 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 31 Oct 2018 23:21:35 -0400
      Finished:     Wed, 31 Oct 2018 23:23:54 -0400
    Ready:          True
    Restart Count:  103
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-xvq8b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-xvq8b
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                       Message
  ----     ------     ----                    ----                       -------
  Normal   Killing    54m (x10 over 4h19m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  9m56s (x92 over 4h20m)  kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff    5m4s (x173 over 4h10m)  kubelet, suey.nknwn.local  Back-off restarting failed container
# kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc 
Name:               coredns-576cbf47c7-nm7gc
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               suey.nknwn.local/192.168.86.81
Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        cni.projectcalico.org/podIP: 192.168.0.31/32
Status:             Running
IP:                 192.168.0.31
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:  docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:      docker-pullable://k8s.gcr.io/[email protected]:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 31 Oct 2018 23:29:11 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 31 Oct 2018 23:21:58 -0400
      Finished:     Wed, 31 Oct 2018 23:24:08 -0400
    Ready:          True
    Restart Count:  102
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-xvq8b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-xvq8b
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                       Message
  ----     ------     ----                    ----                       -------
  Normal   Killing    44m (x12 over 4h18m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
  Warning  BackOff    4m58s (x170 over 4h9m)  kubelet, suey.nknwn.local  Back-off restarting failed container
  Warning  Unhealthy  8s (x102 over 4h19m)    kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
# 

kubectls log:

# kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc 
E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974857       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.975493       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.976732       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.977788       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.976164       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.977415       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.978332       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating
E1101 03:33:31.976864       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.978080       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.979156       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
# 

# kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd
.:53
2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:13 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected
# kubectl -n kube-system log -f coredns-576cbf47c7-hhmws
.:53
2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:18 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
# 

describe (apiserver):

# kubectl -n kube-system describe pod kube-apiserver-suey.nknwn.local
Name:               kube-apiserver-suey.nknwn.local
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               suey.nknwn.local/192.168.87.20
Start Time:         Fri, 02 Nov 2018 00:28:44 -0400
Labels:             component=kube-apiserver
                    tier=control-plane
Annotations:        kubernetes.io/config.hash: 2433a531afe72165364aace3b746ea4c
                    kubernetes.io/config.mirror: 2433a531afe72165364aace3b746ea4c
                    kubernetes.io/config.seen: 2018-11-02T00:28:43.795663261-04:00
                    kubernetes.io/config.source: file
                    scheduler.alpha.kubernetes.io/critical-pod: 
Status:             Running
IP:                 192.168.87.20
Containers:
  kube-apiserver:
    Container ID:  docker://659456385a1a859f078d36f4d1b91db9143d228b3bc5b3947a09460a39ce41fc
    Image:         k8s.gcr.io/kube-apiserver:v1.12.2
    Image ID:      docker-pullable://k8s.gcr.io/[email protected]:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
    Port:          <none>
    Host Port:     <none>
    Command:
      kube-apiserver
      --authorization-mode=Node,RBAC
      --advertise-address=192.168.87.20
      --allow-privileged=true
      --client-ca-file=/etc/kubernetes/pki/ca.crt
      --enable-admission-plugins=NodeRestriction
      --enable-bootstrap-token-auth=true
      --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
      --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
      --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
      --etcd-servers=https://127.0.0.1:2379
      --insecure-port=0
      --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
      --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
      --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
      --requestheader-allowed-names=front-proxy-client
      --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User
      --secure-port=6443
      --service-account-key-file=/etc/kubernetes/pki/sa.pub
      --service-cluster-ip-range=10.96.0.0/12
      --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
      --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    State:          Running
      Started:      Sun, 04 Nov 2018 22:57:27 -0500
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Nov 2018 20:12:06 -0500
      Finished:     Sun, 04 Nov 2018 22:55:24 -0500
    Ready:          True
    Restart Count:  2
    Requests:
      cpu:        250m
    Liveness:     http-get https://192.168.87.20:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
    Environment:  <none>
    Mounts:
      /etc/ca-certificates from etc-ca-certificates (ro)
      /etc/kubernetes/pki from k8s-certs (ro)
      /etc/ssl/certs from ca-certs (ro)
      /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
      /usr/share/ca-certificates from usr-share-ca-certificates (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etc-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/ca-certificates
    HostPathType:  DirectoryOrCreate
  k8s-certs:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType:  DirectoryOrCreate
  ca-certs:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  DirectoryOrCreate
  usr-share-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /usr/share/ca-certificates
    HostPathType:  DirectoryOrCreate
  usr-local-share-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /usr/local/share/ca-certificates
    HostPathType:  DirectoryOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute
Events:            <none>
# 

syslog (Host):

4. November 22:59:36 suey kubelet [1234]: E1104 22: 59: 36.139538 1234 pod_workers.go: 186] Fehler beim Synchronisieren des Pods d8146b7e-de57-11e8-a1e2-ec8eb57434c8 ("coredns-576cbf47c7-hhmws_kube-system (d8146b7e-de57-11e8-a1e2-ec8eb57434c8)"), Überspringen: "StartContainer" für "Coredns" mit .__ fehlgeschlagen. CrashLoopBackOff: "40-Sekunden-Neustart des Back-Offs fehlgeschlagen.

Bitte beraten.

4
alexus

Dieser Fehler

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

wird verursacht, wenn CoreDNS eine Schleife in der Auflösungskonfiguration erkennt und das beabsichtigte Verhalten ist. Sie treffen dieses Problem:

https://github.com/kubernetes/kubeadm/issues/1162

https://github.com/coredns/coredns/issues/2087

Hacky-Lösung: Deaktivieren Sie die CoreDNS-Schleifenerkennung.

Bearbeiten Sie die CoreDNS-Configmap:

kubectl -n kube-system edit configmap coredns

Entfernen Sie die Zeile mit loop oder kommentieren Sie sie aus, speichern Sie und beenden Sie sie.

Dann entfernen Sie die CoreDNS-Pods, damit mit der neuen Konfiguration neue erstellt werden können:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Danach sollte alles in Ordnung sein.

Bevorzugte Lösung: Entfernen Sie die Schleife in der DNS-Konfiguration.

Prüfen Sie zunächst, ob Sie systemd-resolved verwenden. Wenn Sie Ubuntu 18.04 verwenden, ist dies wahrscheinlich der Fall.

systemctl list-unit-files | grep enabled | grep systemd-resolved

Wenn ja, prüfen Sie, welche resolv.conf-Datei Ihr Cluster als Referenz verwendet:

ps auxww | grep kubelet

Möglicherweise sehen Sie eine Zeile wie:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

Der wichtige Teil ist --resolv-conf - wir ermitteln, ob systemd resolv.conf verwendet wird oder nicht. 

Wenn es der resolv.conf von systemd ist, gehen Sie wie folgt vor:

Überprüfen Sie den Inhalt von /run/systemd/resolve/resolv.conf, um festzustellen, ob ein Datensatz wie folgt vorhanden ist:

nameserver 127.0.0.1

Wenn es 127.0.0.1 gibt, ist es derjenige, der die Schleife verursacht.

Um es loszuwerden, sollten Sie diese Datei nicht bearbeiten, sondern an anderen Stellen überprüfen, ob sie ordnungsgemäß generiert wird.

Überprüfen Sie alle Dateien unter /etc/systemd/network und ob Sie einen Datensatz finden

DNS=127.0.0.1

diesen Datensatz löschen. Überprüfen Sie auch /etc/systemd/resolved.conf und machen Sie dasselbe, falls erforderlich. Stellen Sie sicher, dass mindestens ein oder zwei DNS-Server konfiguriert sind, z. B. 

DNS=1.1.1.1 1.0.0.1

Starten Sie anschließend die systemd-Dienste neu, um die Änderungen zu übernehmen: Systemctl Neustart von systemd-networkd systemd-resolution

Vergewissern Sie sich danach, dass sich DNS=127.0.0.1 nicht mehr in der resolv.conf-Datei befindet:

cat /run/systemd/resolve/resolv.conf

Lösen Sie anschließend die DNS-Pods erneut aus

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Summary: Die Lösung besteht darin, die DNS-Lookup-Schleife aus der Host-DNS-Konfiguration zu entfernen. Die Schritte unterscheiden sich zwischen den verschiedenen resolv.conf-Managern/Implementierungen.

8
Utku Özdemir

Hier sind einige Shell-Hacker, die Utku s answer automatisieren:

# remove loop from DNS config files
Sudo find /etc/systemd/network /etc/systemd/resolved.conf -type f \
    -exec sed -i '/^DNS=127.0.0.1/d' {} +

# if necessary, configure some DNS servers (use cloudfare public)
if ! grep '^DNS=.*' /etc/systemd/resolved.conf; then
    Sudo sed -i '$aDNS=1.1.1.1 1.0.0.1' /etc/systemd/resolved.conf
fi

# restart systemd services
Sudo systemctl restart systemd-networkd systemd-resolved

# force (re-) creation of the dns pods
kubectl -n kube-system delete pod -l k8s-app=kube-dns
0
rubicks