Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico node error - iptables-legacy-save command failed #8831

Closed
Farhanec07 opened this issue May 16, 2024 · 20 comments · Fixed by #9022
Closed

Calico node error - iptables-legacy-save command failed #8831

Farhanec07 opened this issue May 16, 2024 · 20 comments · Fixed by #9022

Comments

@Farhanec07
Copy link

Expected Behavior

Current Behavior

calico-kube-controllers-6fb59668cc-k746l   0/1     CrashLoopBackOff   38 (4m22s ago)   120m
calico-node-lzwd5                          0/1     Running            0                117m
calico-node-m5pdk                          0/1     Running            11 (72s ago)     117m
calico-node-zngzj                          1/1     Running            1 (51m ago)      117m
calico-typha-5f49b8b8c4-hmspz              1/1     Running            0                120m
calico-typha-5f49b8b8c4-kksv4              1/1     Running            0                120m

panic which i observed that its failing to save iptables rules causing pods to crash.
calico-node pod log -

panic: (*logrus.Entry) 0xc0005ed420
2024-05-15 17:52:12.344 [WARNING][61470] felix/table.go 840: iptables save failed error=exit status 127
2024-05-15 17:52:12.949 [PANIC][61470] felix/table.go 784: iptables-legacy-save command failed after retries ipVersion=0x4 table="raw"
panic: (*logrus.Entry) 0xc000720000

goroutine 172 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc0003ada40, 0x0, {0xc00034d640, 0x31}) ...


2024-05-15 17:52:14.307 [WARNING][61545] felix/table.go 840: iptables save failed error=exit status 127
2024-05-15 17:52:14.307 [WARNING][61545] felix/table.go 778: iptables-legacy-save command failed error=exit status 127 ipVersion=0x4 stderr="" table="raw"
2024-05-15 17:52:14.309 [WARNING][61545] felix/table.go 840: iptables save failed error=exit status 127
2024-05-15 17:52:14.310 [WARNING][61545] felix/table.go 778: iptables-legacy-save command failed error=exit status 127 ipVersion=0x4 stderr="" table="nat"
2024-05-15 17:52:14.312 [WARNING][61545] felix/table.go 840: iptables save failed error=exit status 127
2024-05-15 17:52:14.312 [WARNING][61545] felix/table.go 778: iptables-legacy-save command failed error=exit status 127 ipVersion=0x4 stderr="" table="mangle"
2024-05-15 17:52:14.314 [WARNING][61545] felix/table.go 840: iptables save failed error=exit status 127
2024-05-15 17:52:14.314 [WARNING][61545] felix/table.go 778: iptables-legacy-save command failed error=exit status 127 ipVersion=0x4 stderr="" table="filter"
2024-05-15 17:52:14.478 [INFO][61545] felix/health.go 294: Reporter is not ready: reporting non-ready. name="InternalDataplaneMainLoop"

checked cni.log . could see only below error are

2024-05-13 15:13:19.864 [ERROR][4517] plugin.go 580: Final result of CNI DEL was an error. error=stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

2024-05-13 15:13:58.392 [WARNING][7280] k8s.go 549: CNI_CONTAINERID does not match WorkloadEndpoint ConainerID, don't delete WEP. ContainerID="1a5404058443532a2fd04878af23e0d33b8be65f497d56f42d2f546310dcbc9b" WorkloadEndpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"ip--10--80--187--34.us--west--2.compute.internal-k8s-calico--kube--controllers--6fb59668cc--slzxw-eth0", GenerateName:"calico-kube-controllers-6fb59668cc-", Namespace:"kube-system", SelfLink:"", UID:"1a92bc05-cfae-436d-9ae0-8a5bbfd39918", ResourceVersion:"2657", Generation:0, CreationTimestamp:time.Date(2024, time.May, 13, 15, 9, 21, 0, time.Local), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"k8s-app":"calico-kube-controllers", "pod-template-hash":"6fb59668cc", "projectcalico.org/namespace":"kube-system", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"calico-kube-controllers"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"ip-10-80-187-34.us-west-2.compute.internal", ContainerID:"5ce6b20118a30e7fb8568e63265afe2a3f51590d9ddc7e82de3ab11c5d4f52e5", Pod:"calico-kube-controllers-6fb59668cc-slzxw", Endpoint:"eth0", ServiceAccountName:"calico-kube-controllers", IPNetworks:[]string{"100.100.208.65/32"}, IPNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.kube-system", "ksa.kube-system.calico-kube-controllers"}, InterfaceName:"cali0551ebff1eb", MAC:"", Ports:[]v3.WorkloadEndpointPort(nil), AllowSpoofedSourcePrefixes:[]string(nil)}}
2024-05-13 15:13:58.392 [INFO][7280] k8s.go 585: Cleaning up netns ContainerID="1a5404058443532a2fd04878af23e0d33b8be65f497d56f42d2f546310dcbc9b"
2024-05-13 15:13:58.392 [INFO][7280] dataplane_linux.go 526: CleanUpNamespace called with no netns name, ignoring. ContainerID="1a5404058443532a2fd04878af23e0d33b8be65f497d56f42d2f546310dcbc9b" iface="eth0" netns=""

2024-05-13 15:13:58.583 [WARNING][7343] ipam_plugin.go 432: Asked to release address but it doesn't exist. Ignoring ContainerID="98c5fab830d6aad1c63b765bc11ad533d7160d9144989f232a9b5716415c804c" HandleID="k8s-pod-network.98c5fab830d6aad1c63b765bc11ad533d7160d9144989f232a9b5716415c804c" Workload="ip--10--80--187--34.us--west--2.compute.internal-k8s-kubed--864bd6d7f--jjfzv-eth0"

while exec into pod iptables cmd is not executing

$ kubectl exec -it calico-node-25spv -n kube-system -- /bin/bash

[root@ip-10-80-175-186 /]# iptables-save
iptables-save: symbol lookup error: iptables-save: undefined symbol: xtables_strdup
[root@ip-10-80-175-186 /]# iptables --version
iptables: symbol lookup error: iptables: undefined symbol: xtables_strdup

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

  • Calico version 3.27.3
  • Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.28
  • Operating System and version: linux
  • Link to your project (optional):
@Farhanec07
Copy link
Author

Please share your thoughts on this. We are currently blocked from upgrading to EKS 1.29 due to this issue.

@tomastigera
Copy link
Contributor

What Linux distro/version do you use? Does it have (proper) support for iptables?

@Farhanec07
Copy link
Author

Farhanec07 commented Jun 5, 2024

What Linux distro/version do you use? Does it have (proper) support for iptables?

we create cluster on ami which is amazon-linux-2-arm64 AMI
on which we faced above issue

@jonathan-hurley
Copy link

jonathan-hurley commented Jun 14, 2024

The actual AMIs which are in question here are the Optimized EKS ones (such as amazon-eks-arm64-node-1.26 and amazon-eks-arm64-node-1.29).

All versions of these AMIs (even the x86/AMD64 ones) have the same version of iptables (v1.8.4):

rpm -q iptables nftables firewalld
iptables-1.8.4-10.amzn2.1.2.aarch64
package nftables is not installed
package firewalld is not installed

So I don't think this would be related to the version of iptables. The same commands work on much older 1.26 ARM instances (which work with earlier versions of Calico).

# iptables -A INPUT -s 1.2.3.4 -j DROP
# iptables-save
# Generated by iptables-save v1.8.4 on Fri Jun 14 19:59:06 2024
*filter
:INPUT ACCEPT [65:3736]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [34:2728]
-A INPUT -s 1.2.3.4/32 -j DROP
COMMIT
# Completed on Fri Jun 14 19:59:06 2024
@coutinhop
Copy link
Contributor

@jonathan-hurley the function in question (xtables_strdup()) is present in iptables v1.8.8 (which is what calico v3.27.2+ uses): https://github.com/PKRoma/iptables/blob/v1.8.8/libxtables/xtables.c#L463, but it doesn't seem to be there in the version you mentioned (v1.8.4): https://github.com/PKRoma/iptables/blob/v1.8.4/libxtables/xtables.c

Would it be possible to upgrade iptables to v1.8.8 in your instances? Alternatively, calico pre-v3.27.2 should be using iptables v1.8.4, could you try that and see if the issue is resolved? (not ideal, but this would at least help diagnose this)

@jonathan-hurley
Copy link

Amazon EKS optimized images have always used 1.8.4; we do not have the option to change this.

We must use the latest versions of Calico in order to resolve CVEs.

@lubronzhan
Copy link
Contributor

lubronzhan commented Jul 8, 2024

@jonathan-hurley the function in question (xtables_strdup()) is present in iptables v1.8.8 (which is what calico v3.27.2+ uses): PKRoma/iptables@v1.8.8/libxtables/xtables.c#L463, but it doesn't seem to be there in the version you mentioned (v1.8.4): PKRoma/iptables@v1.8.4/libxtables/xtables.c

Would it be possible to upgrade iptables to v1.8.8 in your instances? Alternatively, calico pre-v3.27.2 should be using iptables v1.8.4, could you try that and see if the issue is resolved? (not ideal, but this would at least help diagnose this)

Hi @coutinhop I got seg fault

2024-05-23 10:22:44.982 [WARNING][77] felix/table.go 816: iptables save failed error=signal: segmentation fault (core dumped)
2024-05-23 10:22:44.982 [WARNING][77] felix/table.go 765: iptables-nft-save command failed error=signal: segmentation fault (core dumped) ipVersion=0x4 stderr="" table="filter"

when using calico 3.26.3 on a Photon OS with iptables 1.8.9. Do you mean the iptables on the OS should also use 1.8.4 version to avoid issue? I wish calico prints more than just seg fault. Looks like hashes does contain stdout, maybe it could be printed in debug mode. Just found out it already in if debug mode

root [ /home/capv ]# iptables --version
iptables v1.8.9 (nf_tables)

If I do iptables -L inside the calico pod, it's empty, but I got valid output if I run it inside the VM directly.
Is it because calico is still iptables-nft to call nftables API, so the iptables command on the calico pod which only talks to legacy API can't get correct information?

@jonathan-hurley
Copy link

@coutinhop This change breaks Calico 3.27.2+ on every version of Amazon Linux 2 and every Optimized EKS AMI based on it. The latest versions of Amazon Linux 2 still only support iptables 1.8.4 are are not EOL for another entire year.

What is the possibility that the Felix change which caused this can be reverted and the xtables_strdup() can not be used? It seems like this is just a wrapper function anyway.

@caseydavenport
Copy link
Member

@coutinhop I don't think this has anything to do with the AMI / version of iptables shipped on the Amazon VM instances. Calico packages the necessary libraries for the binaries it uses inside the container, so it sounds like we have published ARM images that are missing a necessary lib (I'm not able to reproduce this on amd64)

If I had to guess, we update the version of iptables to v1.8.8 which introduced a dependency on xtables_strdup but somehow have not included that symbol in the shared libraries we pacakge into the container.

@caseydavenport
Copy link
Member

This commit is likely to be the one that introduced the problem: c053d1c

@jonathan-hurley
Copy link

jonathan-hurley commented Jul 17, 2024

so it sounds like we have published ARM images that are missing a necessary lib (I'm not able to reproduce this on amd64)

@caseydavenport - I can reproduce this on AMD64 as well using an m5.large instance type running Amazon Linux 2. It appears as though the architecture doesn't matter here.

@caseydavenport
Copy link
Member

Well, that certainly changes things. Perhaps there is some interaction with the host packages that I wasn't expecting.

@jonathan-hurley
Copy link

Here are the details of the VM which I tried:

# uname -p
x86_64
# cat /etc/os-release

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

And when I try to invoke iptables from the Calico pod on this machine, I get the same xtables_strdup message:

# iptables --version
iptables: symbol lookup error: iptables: undefined symbol: xtables_strdup
@coutinhop
Copy link
Contributor

I managed to push a PR with a tentative fix, see here for more details: #9022

@jonathan-hurley or @Farhanec07, would any of you be able to test this with an image I built locally with the fix and let us know if that gets rid of the problem?

amd64: quay.io/coutinho/calico-node:v3.28.0-amd64
arm64: quay.io/coutinho/calico-node:v3.28.0-arm64

Thanks!

@jonathan-hurley
Copy link

jonathan-hurley commented Jul 18, 2024

@coutinhop - that fixed it!

calico-node 2024-07-18 15:23:34.839 [INFO][66] felix/int_dataplane.go 1360: Linux interface addrs changed. addrs=set.Set{} ifaceName="calico_tmp_B"
calico-node 2024-07-18 15:23:34.839 [INFO][66] felix/int_dataplane.go 1316: Linux interface state changed. ifIndex=91563 ifaceName="calico_tmp_A" state="down"
calico-node 2024-07-18 15:23:34.839 [INFO][66] felix/int_dataplane.go 1360: Linux interface addrs changed. addrs=set.Set{} ifaceName="calico_tmp_A"
calico-node 2024-07-18 15:23:34.850 [INFO][66] felix/wireguard.go 1704: Trying to connect to linkClient ipVersion=0x4
install-cni     {
calico-node 2024-07-18 15:23:34.850 [INFO][66] felix/route_rule.go 189: Trying to connect to netlink
calico-node 2024-07-18 15:23:34.850 [INFO][66] felix/wireguard.go 635: Public key out of sync or updated ipVersion=0x4 ourPublicKey=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
calico-node 2024-07-18 15:23:34.869 [INFO][66] felix/int_dataplane.go 1857: Completed first update to dataplane. secsSinceStart=0.459817401
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/health.go 206: Health of component changed name="InternalDataplaneMainLoop" newReport="live,ready" oldReport="live,non-ready"
install-cni       "type": "portmap",
install-cni       "snat": true,
install-cni       "capabilities": {"portMappings": true}
install-cni     },
install-cni     {
install-cni       "type": "bandwidth",
install-cni       "capabilities": {"bandwidth": true}
install-cni     }
install-cni   ]
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/int_dataplane.go 1957: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_B", State:"down", Index:91562}
install-cni }
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/int_dataplane.go 1981: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Typed[string]{}}
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Typed[string]{}}
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/int_dataplane.go 1957: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_A", State:"down", Index:91563}
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/int_dataplane.go 1981: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Typed[string]{}}
calico-node 2024-07-18 15:23:34.872 [INFO][66] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Typed[string]{}}
calico-node 2024-07-18 15:23:34.914 [INFO][66] felix/int_dataplane.go 1316: Linux interface state changed. ifIndex=91563 ifaceName="calico_tmp_A" state=""
calico-node 2024-07-18 15:23:34.914 [INFO][66] felix/int_dataplane.go 1360: Linux interface addrs changed. addrs=<nil> ifaceName="calico_tmp_A"
calico-node 2024-07-18 15:23:34.914 [INFO][66] felix/int_dataplane.go 1957: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_A", State:"", Index:91563}
calico-node 2024-07-18 15:23:34.914 [INFO][66] felix/int_dataplane.go 1981: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Set[string](nil)}
calico-node 2024-07-18 15:23:34.914 [INFO][66] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Set[string](nil)}
calico-node 2024-07-18 15:23:34.915 [INFO][66] felix/int_dataplane.go 1316: Linux interface state changed. ifIndex=91562 ifaceName="calico_tmp_B" state=""
calico-node 2024-07-18 15:23:34.915 [INFO][66] felix/int_dataplane.go 1360: Linux interface addrs changed. addrs=<nil> ifaceName="calico_tmp_B"
calico-node 2024-07-18 15:23:34.915 [INFO][66] felix/int_dataplane.go 1957: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_B", State:"", Index:91562}
calico-node 2024-07-18 15:23:34.915 [INFO][66] felix/int_dataplane.go 1981: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Set[string](nil)}
calico-node 2024-07-18 15:23:34.916 [INFO][66] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Set[string](nil)}
calico-node 2024-07-18 15:23:34.916 [INFO][66] felix/int_dataplane.go 1867: Dataplane updates throttled
calico-node bird: device1: Initializing
calico-node bird: direct1: Initializing
calico-node bird: device1: Starting
calico-node bird: device1: Connected to table master
calico-node bird: device1: State changed to feed
calico-node bird: direct1: Starting
calico-node bird: direct1: Connected to table master
calico-node bird: direct1: State changed to feed
calico-node bird: Graceful restart started
calico-node bird: Graceful restart done
calico-node bird: Started
calico-node bird: device1: State changed to up
calico-node bird: direct1: State changed to up
calico-node bird: device1: Initializing
calico-node bird: direct1: Initializing
calico-node bird: Mesh_10_80_130_113: Initializing
calico-node bird: Mesh_10_80_160_114: Initializing
calico-node bird: device1: Starting
calico-node bird: device1: Connected to table master
calico-node bird: device1: State changed to feed
calico-node bird: direct1: Starting
calico-node bird: direct1: Connected to table master
calico-node bird: direct1: State changed to feed
calico-node bird: Mesh_10_80_130_113: Starting
calico-node bird: Mesh_10_80_130_113: State changed to start
calico-node bird: Mesh_10_80_160_114: Starting
calico-node bird: Mesh_10_80_160_114: State changed to start
calico-node bird: Graceful restart started
calico-node bird: Started
calico-node bird: device1: State changed to up
calico-node bird: direct1: State changed to up
calico-node bird: Mesh_10_80_160_114: Connected to table master
calico-node bird: Mesh_10_80_160_114: State changed to feed
calico-node bird: Mesh_10_80_160_114: State changed to up
calico-node 2024-07-18 15:23:39.866 [INFO][66] felix/int_dataplane.go 1826: Dataplane updates no longer throttled
Stream closed EOF for kube-system/calico-node-mbblk (install-cni)
Stream closed EOF for kube-system/calico-node-mbblk (mount-bpffs)
Stream closed EOF for kube-system/calico-node-mbblk (upgrade-ipam)
calico-node 2024-07-18 15:23:52.000 [INFO][66] felix/health.go 336: Overall health status changed: live=true ready=true
calico-node +---------------------------+---------+----------------+-----------------+--------+
calico-node |         COMPONENT         | TIMEOUT |    LIVENESS    |    READINESS    | DETAIL |
calico-node +---------------------------+---------+----------------+-----------------+--------+
calico-node | CalculationGraph          | 30s     | reporting live | reporting ready |        |
calico-node | FelixStartup              | -       | reporting live | reporting ready |        |
calico-node | InternalDataplaneMainLoop | 1m30s   | reporting live | reporting ready |        |
calico-node +---------------------------+---------+----------------+-----------------+--------+
Containers:
  calico-node:
    Container ID:   containerd://8a230a9241a8bf8c09926d423e7f17fd2c1eeef30b7e9416011e9201f36d3b39
    Image:          quay.io/coutinho/calico-node:v3.28.0-arm64
    Image ID:       quay.io/coutinho/calico-node@sha256:6f30a1f29edf83f7e04626d883a936259a76bec9cde60f6945ce7e3a6d251439
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 18 Jul 2024 11:23:33 -0400
    Ready:          True
@coutinhop
Copy link
Contributor

@jonathan-hurley that is great news! I'm still having a bit of trouble reproducing the issue myself, could you share some more details about your setup? Like, inside a calico-node pod WITH the problem, could you run env and ls -l /lib64/libxtables.so.12*, cat /etc/ld.so.conf, cat /etc/ld.so.conf.d/*?
Also, it seems like it didn't happen consistently on all pods, did it? Anything in particular in your setup that could be a cause?
I've tried spinning up amazon linux 2 VMs and an EKS cluster, but to no avail iptables and iptables-legacy-save don't fail with that error for me...
I'm glad the issue seems to be gone, but trying to understand better what caused it, appreciate your help. Thanks!

@jonathan-hurley
Copy link

Yes, the problem happened consistently on all pods, on both AMD64/x86 and ARM64 architectures. We use the Optimized EKS AMI images (which are based off of Amazon Linux 2).

# ls -l /lib64/libxtables.so.12*
lrwxrwxrwx 1 root root    20 Nov 16  2023 /lib64/libxtables.so.12 -> libxtables.so.12.3.0
-rwxr-xr-x 1 root root 70200 Nov 16  2023 /lib64/libxtables.so.12.3.0
# env
KUBE_DNS_SERVICE_PORT=53
KUBE_DNS_PORT_53_TCP_PROTO=tcp
KUBED_SERVICE_PORT=443
CALICO_TYPHA_SERVICE_PORT_CALICO_TYPHA=5473
IP_AUTODETECTION_METHOD=first-found
FELIX_TYPHAK8SSERVICENAME=calico-typha
KUBED_SERVICE_PORT_API=443
SVDIR=/etc/service/enabled
KUBED_PORT_443_TCP=tcp://172.20.136.172:443
CALICO_IPV4POOL_CIDR=100.100.0.0/16
LANG=C.utf8
HOSTNAME=ip-10-80-1-1.us-west-2.compute.internal
KUBED_PORT_443_TCP_PORT=443
CALICO_TYPHA_SERVICE_PORT=5473
FELIX_WIREGUARDMTU=0
KUBE_DNS_PORT_53_UDP=udp://172.20.0.10:53
FELIX_VXLANMTU=0
CALICO_TYPHA_SERVICE_HOST=172.20.122.83
CALICO_IPV4POOL_VXLAN=Never
KUBE_DNS_PORT_53_UDP_PROTO=udp
FELIX_HEALTHENABLED=true
CALICO_DISABLE_FILE_LOGGING=true
which_declare=declare -f
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
container=oci
FELIX_USAGEREPORTINGENABLED=false
CALICO_TYPHA_PORT=tcp://172.20.122.83:5473
KUBE_DNS_PORT_53_UDP_ADDR=172.20.0.10
KUBERNETES_PORT=tcp://172.20.0.1:443
PWD=/
HOME=/root
CALICO_TYPHA_PORT_5473_TCP_PROTO=tcp
KUBE_DNS_PORT_53_TCP_ADDR=172.20.0.10
NODENAME=ip-10-80-1-1.us-west-2.compute.internal
KUBE_DNS_PORT=udp://172.20.0.10:53
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_PORT=443
KUBED_PORT_443_TCP_PROTO=tcp
WAIT_FOR_DATASTORE=true
KUBED_PORT_443_TCP_ADDR=172.20.136.172
KUBERNETES_PORT_443_TCP=tcp://172.20.0.1:443
KUBE_DNS_PORT_53_TCP_PORT=53
CLUSTER_TYPE=k8s,bgp
CALICO_TYPHA_PORT_5473_TCP_ADDR=172.20.122.83
IP=autodetect
TERM=xterm
FELIX_DEFAULTENDPOINTTOHOSTACTION=ACCEPT
CALICO_IPV4POOL_IPIP=CrossSubnet
KUBE_DNS_PORT_53_TCP=tcp://172.20.0.10:53
CALICO_TYPHA_PORT_5473_TCP=tcp://172.20.122.83:5473
FELIX_IPINIPMTU=0
FELIX_LOGSEVERITYSCREEN=info
CALICO_IPV6POOL_VXLAN=Never
SHLVL=1
KUBERNETES_SERVICE_PORT=443
CALICO_NETWORKING_BACKEND=bird
KUBE_DNS_SERVICE_PORT_DNS=53
CALICO_TYPHA_PORT_5473_TCP_PORT=5473
KUBE_DNS_SERVICE_PORT_DNS_TCP=53
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
DATASTORE_TYPE=kubernetes
KUBED_SERVICE_HOST=172.20.136.172
FELIX_IPV6SUPPORT=false
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBE_DNS_PORT_53_UDP_PORT=53
KUBE_DNS_SERVICE_HOST=172.20.0.10
KUBED_PORT=tcp://172.20.136.172:443
BASH_FUNC_which%%=() {  ( alias;
 eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@
}
_=/usr/bin/env
# cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
# ls -l /etc/ld.so.conf.d
total 0
@coutinhop
Copy link
Contributor

Thank you @jonathan-hurley!

Just to clarify, is this in the calico-node pod? What image are you using?

This looks very weird:

# ls -l /lib64/libxtables.so.12*
lrwxrwxrwx 1 root root    20 Nov 16  2023 /lib64/libxtables.so.12 -> libxtables.so.12.3.0
-rwxr-xr-x 1 root root 70200 Nov 16  2023 /lib64/libxtables.so.12.3.0

As 'libxtables.so.12.3.0' is the "outdated" version of the lib which won't contain the xtables_strdup symbol... Are you building a custom calico-node image? If so, can you share the steps/modifications? If not, can you share the image url?

Would you be willing to hit me up in the Calico Users slack? We could make this conversation a lot more real-time if so: https://slack.projectcalico.org/

@jonathan-hurley
Copy link

Yes, these commands were being run from the calico-node pod before it died due to Felix crashing. The image for this run was ami-0be1daad79c89dd0a with a build ID of amazon/amazon-eks-node-1.28-v20240703.

Sure, let me hop on slack ...

@fasaxc
Copy link
Member

fasaxc commented Jul 19, 2024

Resolution was that the user was using an incorrectly built external image from Iron Bank. It had incorrect versions of the libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants