Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Build - DNS Does not Resolve #356

Closed
twajr opened this issue May 30, 2018 · 26 comments
Closed

Latest Build - DNS Does not Resolve #356

twajr opened this issue May 30, 2018 · 26 comments

Comments

@twajr
Copy link

twajr commented May 30, 2018

Ok, I've run through this on GCP several times, and it fails in the same place each time. IOW, a step-by-step iteration on GCP current does not work, IMO.

The issue is within the kube-dns step (12).

[root@0c67614bb53b hard-way]# kubectl exec -ti $POD_NAME -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10
nslookup: can't resolve 'kubernetes'

The contents of the worker's resolve.conf:

root@worker-0:~# cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
...
nameserver 127.0.0.53
search c.my-project.internal google.internal

The default kube-dns yaml from https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml shows version 1.14.7 and I've also tried 1.14.10 but same result...

logs from kube-dns:

[root@0c67614bb53b hard-way]#  kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0530 18:02:17.416592       1 dns.go:48] version: 1.14.10
I0530 18:02:17.417868       1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0530 18:02:17.418040       1 server.go:121] FLAG: --alsologtostderr="false"
I0530 18:02:17.418111       1 server.go:121] FLAG: --config-dir="/kube-dns-config"
I0530 18:02:17.418171       1 server.go:121] FLAG: --config-map=""
I0530 18:02:17.418226       1 server.go:121] FLAG: --config-map-namespace="kube-system"
I0530 18:02:17.418281       1 server.go:121] FLAG: --config-period="10s"
I0530 18:02:17.418337       1 server.go:121] FLAG: --dns-bind-address="0.0.0.0"
I0530 18:02:17.418392       1 server.go:121] FLAG: --dns-port="10053"
I0530 18:02:17.418447       1 server.go:121] FLAG: --domain="cluster.local."
I0530 18:02:17.418504       1 server.go:121] FLAG: --federations=""
I0530 18:02:17.418558       1 server.go:121] FLAG: --healthz-port="8081"
I0530 18:02:17.418613       1 server.go:121] FLAG: --initial-sync-timeout="1m0s"
I0530 18:02:17.418669       1 server.go:121] FLAG: --kube-master-url=""
I0530 18:02:17.418811       1 server.go:121] FLAG: --kubecfg-file=""
I0530 18:02:17.418866       1 server.go:121] FLAG: --log-backtrace-at=":0"
I0530 18:02:17.418920       1 server.go:121] FLAG: --log-dir=""
I0530 18:02:17.418973       1 server.go:121] FLAG: --log-flush-frequency="5s"
I0530 18:02:17.419119       1 server.go:121] FLAG: --logtostderr="true"
I0530 18:02:17.419176       1 server.go:121] FLAG: --nameservers=""
I0530 18:02:17.419248       1 server.go:121] FLAG: --stderrthreshold="2"
I0530 18:02:17.419304       1 server.go:121] FLAG: --v="2"
I0530 18:02:17.419356       1 server.go:121] FLAG: --version="false"
I0530 18:02:17.419419       1 server.go:121] FLAG: --vmodule=""
I0530 18:02:17.419506       1 server.go:169] Starting SkyDNS server (0.0.0.0:10053)
I0530 18:02:17.419800       1 server.go:179] Skydns metrics enabled (/metrics:10055)
I0530 18:02:17.419872       1 dns.go:188] Starting endpointsController
I0530 18:02:17.419928       1 dns.go:191] Starting serviceController
I0530 18:02:17.420078       1 dns.go:184] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[] UpstreamNameservers:[]}
I0530 18:02:17.420478       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0530 18:02:17.420551       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0530 18:02:17.920341       1 dns.go:222] Initialized services and endpoints from apiserver
I0530 18:02:17.920367       1 server.go:137] Setting up Healthz Handler (/readiness)
I0530 18:02:17.920450       1 server.go:142] Setting up cache handler (/cache)
I0530 18:02:17.920836       1 server.go:128] Status HTTP port 8081

Please let me know what other info I can provide. Thanks!

@marekk1717
Copy link

I've got the same issue.

@randyrue
Copy link

I'm having a similar problem, at least it's also in the kube-dns steps.

"kubectl create -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml" appears to succeed, the shell output matches yours and all four objects seem to have been created properly.

The subsequent "kubectl get pods -l k8s-app=kube-dns -n kube-system" command returns "No resources found."

"kubectl get pods" and "kubectl get deployments" also return "No resources found."

"kubectl delete -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml" successfully deletes the service, serviceaccount, anc configmap, but then after a wait, I get "error when stopping "kube-dns.yaml": timed out waiting for the condition"

If I then try to recreate the DNS add-on, the first three appear to succeed and then I get an error "Error from server (AlreadyExists): error when creating "https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml": deployments.extensions "kube-dns" already exists"

My kubectl is version 1.10.4, cluster version 1.10.2.

Any guidance would be appreciated. - randy

@randyrue
Copy link

Since posting the above I've reviewed the first 11 sections and confirmed that every "verify" step returns the indicated output: I don't believe I've missed anything in the steps leading up to the kube-dns section.

Hope to hear from you.

@TRoetz
Copy link

TRoetz commented Jun 14, 2018

Can you get to the Node OS? If so, I installed dnsmasq:

apt-get update
apt-get install dnsmasq
I had weave cni - and restarted all the weave pods too - sorted my problems

@randyrue
Copy link

I can ssh to each node. Installed dnsmasq, then installed weave as shown at https://www.weave.works/blog/weave-net-kubernetes-integration/

I appear to have some other problem that's keeping me from running anything including kube-dns, weave, etc. No matter what I create, run, apply, deploy, the setup reports success and then get pods or get deployment always shows "No resources found."

Again, I've reviewed the steps pretty carefully, confirmed that I can "verify" everywhere the steps include that option, have also reviewed my bash history. If I don't hear otherwise soon, at some point I'll likely rip it all out and start again.

@ghost
Copy link

ghost commented Jun 17, 2018

I tested using debian stretch, and while it wasn't working at first (same issue with any service IP, 10.32.0.10, 10.32.0.1, etc...), I only needed to change kube-proxy's mode to userspace instead of iptables.

Not really sure what the issue is though.

Regards

@morphy77
Copy link

morphy77 commented Jul 21, 2018

Also having problems with the DNS verification step, after following all the prerequisite steps and validations successfully.

kubectl exec -ti $POD_NAME -- nslookup kubernetes
;; connection timed out; no servers could be reached

command terminated with exit code 1

The kube-dns pod logs do not show errors initially. But after a couple of minutes, the sidecar container has errors, not sure if this is relevant:

 ❯ kubectl logs kube-dns-598d7bf7d4-w2xpt -n kube-system -c sidecar                                       [10:47:04]
I0721 08:41:31.563987       1 main.go:51] Version v1.14.6-3-gc36cb11
I0721 08:41:31.564420       1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
I0721 08:41:31.564530       1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
I0721 08:41:31.564698       1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:33}
W0721 08:44:08.597570       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:40718->127.0.0.1:53: i/o timeout
W0721 08:44:15.598114       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:33459->127.0.0.1:53: i/o timeout
W0721 08:44:22.598650       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:37114->127.0.0.1:53: i/o timeout
W0721 08:44:29.599084       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:33194->127.0.0.1:53: i/o timeout
W0721 08:44:38.578727       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:39539->127.0.0.1:53: i/o timeout
W0721 08:44:45.579229       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58283->127.0.0.1:53: i/o timeout
W0721 08:44:52.579736       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:49092->127.0.0.1:53: i/o timeout
W0721 08:44:59.580290       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58196->127.0.0.1:53: i/o timeout
W0721 08:45:06.580784       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58502->127.0.0.1:53: i/o timeout
W0721 08:45:13.581239       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:49428->127.0.0.1:53: i/o timeout
W0721 08:45:20.581931       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:44698->127.0.0.1:53: i/o timeout
W0721 08:45:28.647300       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:59476->127.0.0.1:53: i/o timeout
 ❯ kubectl get pods -l k8s-app=kube-dns -n kube-system
NAME                        READY     STATUS    RESTARTS   AGE
kube-dns-598d7bf7d4-w2xpt   3/3       Running   1          6m

Am I missing something? Please advise.

@tmhall99
Copy link

Same issue as Maarthen.
I also ran through all the steps in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ and everything appears to be configured as it should be.

@rafaeltuelho
Copy link

Same issue here! But it seems the DNS addon part is not a prerequisite for the step 13) smoke test :-\

@helcorin
Copy link

helcorin commented Aug 1, 2018

Using a FQDN in request is possible to get the right result and validate if DNS query is working, I found a similar situation about kube-dns: kubernetes/dns#169 but i did not went deep enough to understand if it is really related...

@brianlund
Copy link

Same issue with the time out here also:

$ kubectl exec -ti $POD_NAME -- nslookup kubernetes
;; connection timed out; no servers could be reached

command terminated with exit code 1

I tried both weave and starting from scratch in another region, no luck.

@rafaeltuelho
Copy link

I tried to replace kube-dns by CoreDNS [1], but the timeout issue remains. I suspect something related to iptables rules is affecting the kube-dns.

[1] https://github.com/coredns/deployment/tree/master/kubernetes

@gillarda
Copy link

gillarda commented Aug 3, 2018

Same issue for me.
The kubedns container returns a continous flow of errors:

$ kubectl logs kube-dns-598d7bf7d4-cnrtb -n kube-system -c kubedns
I0803 14:11:52.467482       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:38868->127.0.0.53:53: i/o timeout"
I0803 14:12:02.483275       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:34413->127.0.0.53:53: i/o timeout"
I0803 14:12:02.483387       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:34413->127.0.0.53:53: i/o timeout"
I0803 14:12:12.503366       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:40423->127.0.0.53:53: i/o timeout"
I0803 14:12:12.503459       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:40423->127.0.0.53:53: i/o timeout"
I0803 14:12:22.519478       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:55458->127.0.0.53:53: i/o timeout"
I0803 14:12:22.519599       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:55458->127.0.0.53:53: i/o timeout"
I0803 14:12:32.535359       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:42441->127.0.0.53:53: i/o timeout"
I0803 14:12:32.535456       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:42441->127.0.0.53:53: i/o timeout"
I0803 14:12:42.551380       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:50651->127.0.0.53:53: i/o timeout"
I0803 14:12:42.551473       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:50651->127.0.0.53:53: i/o timeout"
I0803 14:12:52.571296       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:41959->127.0.0.53:53: i/o timeout"
I0803 14:12:52.571436       1 logs.go:41] skydns: failure to forward request "read udp 127.0.0.1:41959->127.0.0.53:53: i/o timeout"
@apantin
Copy link

apantin commented Aug 4, 2018

I had the same problem

  1. dns name 'kubernetes' can't resolve in busybox
POD_NAME=$(kubectl get pods -l run=busybox -o jsonpath="{.items[0].metadata.name}")

kubectl exec -ti $POD_NAME -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10
nslookup: can't resolve 'kubernetes'
  1. dnsmasq log:
    dnsmasq: Maximum number of concurrent DNS queries reached (max: 150)
  2. sidecar log:
    Error getting metrics from dnsmasq: read udp 127.0.0.1:59476->127.0.0.1:53: i/o timeout
  3. pod kube-dns many restarts

Problem solution

  1. It is necessary to change the configuration of the service kubelet
    it is necessary to add the parameter --resolv-conf=/run/systemd/resolve/resolv.conf
    read more here Install on a system using systemd-resolved leads to broken DNS kubernetes/kubeadm#273
    this will remove the error in dnsmasq and sidecar
    The commands in this lab must be run on each worker instance: worker-0, worker-1, and worker-2. Login to each worker instance using the gcloud command. Example:
    gcloud compute ssh worker-0
    Change kubelet service config
cat <<EOF | sudo tee /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
ExecStart=/usr/local/bin/kubelet \
  --config=/var/lib/kubelet/kubelet-config.yaml \
  --container-runtime=remote \
  --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \
  --resolv-conf=/run/systemd/resolve/resolv.conf \
  --image-pull-progress-deadline=2m \
  --kubeconfig=/var/lib/kubelet/kubeconfig \
  --network-plugin=cni \
  --register-node=true \
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF
{
    sudo systemctl daemon-reload
    sudo systemctl restart kubelet
}

result:

kubectl exec -ti $POD_NAME -- nslookup kubernetes
Server:  10.32.0.10
Address: 10.32.0.10:53

** server can't find kubernetes: NXDOMAIN
  1. Problem with nslookup in busybox
    kubedns can resolve kubernetes and kubernetes.default.svc.cluster.local, but not kubernetes.default kubernetes/kubernetes#45479
    for tests, you need to use a different image, for example nginx
kubectl run nginx --image=nginx
POD_NAME=$(kubectl get pods -l run=nginx -o jsonpath="{.items[0].metadata.name}")
kubectl exec -it $POD_NAME -- bash

perform in pod

apt-get update
apt-get install dnsutils
nslookup kubernetes
Server:         10.32.0.10
Address:        10.32.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.32.0.1

Success!

@indrayam
Copy link

indrayam commented Aug 6, 2018

Wow! Just faced this problem a few minutes ago. Will try the solution shared by @apantin above

@indrayam
Copy link

indrayam commented Aug 6, 2018

Yay!!! @apantin solution above worked like a charm. The ONLY additional step I had to do was the following:

After restarting the Kubelet daemon on all the Nodes, I had to delete the kube-dns installation that I already had and reinstall it. Steps listed below:

  1. Delete the kube-dns install:
    kubectl delete -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml

  2. Check that all kube-dns resources are truly deleted
    kubectl get pods -l k8s-app=kube-dns -n kube-system

  3. Re-install kube-dns:
    kubectl create -f https://storage.googleapis.com/kubernetes-the-hard-way/kube-dns.yaml

FWIW, I had deleted all busybox and nginx resources that I had running and re-installed them after the new kube-dns install.

Anyways, as I said, the smoke tests mentioned by @apantin worked exactly as described. Meaning, nslookup on busybox still failed, but nslookup command worked on nginx after installing dnsutils.

Thanks!

@pdecat
Copy link

pdecat commented Aug 16, 2018

Pinning the busybox image to busybox:1.28 resolves the issue.

cf. docker-library/busybox#48

pdecat added a commit to pdecat/kubernetes-the-hard-way that referenced this issue Aug 17, 2018
@hokiegeek2
Copy link

@apantin THANK YOU! I struggled for like 8-9 hours on Friday/Saturday to figure out why nslookup was failing on busybox...and the problem was with busybox itself. Again, many thanks for sharing your experiences on this issue--much, much appreciated.

@akutz
Copy link

akutz commented Sep 1, 2018

Hi all,

I've literally been dealing with this for days. Maybe a week. I've been trying to prove I could stand up K8s on CentOS on vSphere with a custom VMX datasource for cloud-init and some turn up code I've written. I just thought I was doing something wrong. Turns out all along I wasn't failing busybox, busybox was failing me.

@hokiegeek2
Copy link

@akutz Yup, I wasted the better part of entire workday on this. Argh.

@akutz
Copy link

akutz commented Sep 1, 2018

Hi @hokiegeek2,

Yep. I was trying to teach myself the basics of K8s. I learn by doing. So to heck with opinionated Linux distros like Container Linux and Photon. At least for the purpose of learning how the pieces of K8s fit together. I needed to do it from scratch. ./configure over containers :) Anyway, I got to the resolve portion of the guide using CentOS, and I was sure I was doing something wrong.

Hell, I got to this part using Container Linux, and it just dawned on me it was probably working too! I was using Busy Box, and of course it was failing. I just loathed CL because it comes with etcd, ContainerD, etc., and to try and use your own versions of those requires a lot of hackery on your part. I figured it was something conflicting with the stock versions of the same components I was laying down for this guide.

Nope. Just Busy Box. Does anyone even know why this happens?

@akutz
Copy link

akutz commented Sep 1, 2018

Hi @pdecat,

I can confirm it works with busybox:1.28:

$ kubectl run bb3 --image=busybox:1.28 --command -- sleep 3600
deployment.apps/bb3 created
$ POD_NAME=$(kubectl get pods -l run=bb3 -o jsonpath="{.items[0].metadata.name}")
$ kubectl exec -ti $POD_NAME -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local
@omarrayward omarrayward mentioned this issue Sep 3, 2018
jonashackt added a commit to jonashackt/kubernetes-the-hard-way that referenced this issue Sep 4, 2018
Version `1.28.4` of busybox does the `nslookup` correctly as described in the tutorial, the `latest` does not. So it needs to be set explicitely. Fixes kelseyhightower#356. Also see docker-library/busybox#48.
@onprema
Copy link

onprema commented Sep 27, 2018

Thanks everyone for contributing to this issue!

Adding the --resolv-conf=/run/systemd/resolve/resolv.conf, reployling the kube-dns objects, and using busybox:1.28 worked for me.

Thanks @apantin @indrayam @pdecat

@onprema
Copy link

onprema commented Sep 27, 2018

Question for @apantin -- How did you know to use the /run/systemd/resolve/resolv.conf path? The k8s docs say to use --resolv-conf=""...

@kelseyhightower
Copy link
Owner

I've updated the kubelet in the latest guide to use the --resolve-conf flag to ensure DNS works with the system DNS resolver. I've also replaced KubeDNS with CoreDNS.

@vli63127
Copy link

vli63127 commented Sep 7, 2020

I followed @apantin but then after i deleted coredns artifacts i cannot reinstall it anymore. After i create using the yaml file for coredns, no pods get spun and it hangs at deployment level. It also says "kube-dns" is invalid: spec.clusterIP: Invalid value: "10.32.0.10": provided IP is not in the valid range. The range of valid IPs is 10.100.0.0/16. So i changed the cluster IP address.

What did I do wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet