Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

Closed
NickLavrov opened this issue Nov 5, 2018 · 13 comments · Fixed by #892
Closed

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

NickLavrov opened this issue Nov 5, 2018 · 13 comments · Fixed by #892
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@NickLavrov
Copy link

NickLavrov commented Nov 5, 2018

I noticed this after creating one ingress that worked fine, then creating another ingress which caused the first ingress to behave incorrectly (default backend 404, etc).

As far as I can tell, at least these resources and annotations are added to the ingress by GKE:

  • ingress.kubernetes.io/forwarding-rule
  • ingress.kubernetes.io/target-proxy
  • ingress.kubernetes.io/url-map

If my namespace was called xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (32 chars) and my ingress was named xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, I'd see resources with names like:
k8s-um-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx0

This is fine if there's only one ingress, but since my other ingress also started with the same set of characters, there would be a collision. I've looked at the resources/annotations it creates when using shorter names, and I notice that it means to append some random characters at the end. Here, it looks like I'm hitting the 64 character limit before it gets to that random string.

Maybe removing the 64 character limit or adding documentation about how these names are generated would help with this. It took me a while to look into this annotation as the root of my issues.

NOTE: This is the file that contains the naming logic https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go

@freehan freehan added the kind/bug Categorizes issue or PR as related to a bug. label Nov 6, 2018
@freehan
Copy link
Contributor

freehan commented Nov 6, 2018

Thanks for the bug report. Will try to fix this problem. The key is to maintain backward compatibility.

@freehan freehan closed this as completed Nov 6, 2018
@freehan freehan reopened this Nov 6, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2019
@bowei
Copy link
Member

bowei commented Feb 7, 2019

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2019
@freehan
Copy link
Contributor

freehan commented Feb 27, 2019

Add toleration of long name to certain degree: #650

@NickLavrov
Copy link
Author

Closing the issue since it was addressed in the PR above!

@karlkfi
Copy link

karlkfi commented Aug 13, 2019

Changes in #650 do not actually solve this issue completely.

It is still possible for GCLB name and backend name (Network Endpoint Group) to collide if the namespace name + ingress name are too long, especially between two clusters with similar names, because the cluster name is at the end and gets truncated first.

LB: https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go#L340
NEG: https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go#L430

The UID also commonly gets truncated, which increases the chance of UID collisions.

I like that the cluster/namespace/ingress names are in these GCLB names, because it makes them easier to identify when debugging, but the whole name-generation scheme needs to be redesigned to better guarantee uniqueness.

@freehan freehan reopened this Aug 13, 2019
@freehan
Copy link
Contributor

freehan commented Aug 13, 2019

Let me summarize the current naming problem per GCE Loadbalancer resource:

  1. LB frontend resources: FowardingRule, TargetProxy, UrlMap. These resources have the following naming scheme.
k8s-{resource prefix}-{namespace}-{name}-{cluster_uid}

{resource prefix} is the short name of the resource. e.g. "um" is short for UrlMap
{namespace} is the namespace of ingress resource
{name} is the name of ingress resource

The name will be truncated to 63 characters (max length of GCE resource name). When the namespace and name long (55 chars). The GCE resource name will lose uniqueness across multiple ingresses within the same cluster or across multiple clusters. (e.g. if the namespace length is 63 characters, then everything after 63 characters will be truncated. If there is another ingress with the same namespace prefix in the same cluster or another cluster. There will be GCE resource naming collision) In addition, the current GC logic will leak resources with long name (because cluster uid is truncated, hence GC does not recognize the resource as managed by itself)

  1. Cluster UID.
    The current cluster uid is a 16 character generated hash. The cluster uid is currently stored in a configmap under kube-system. Since it is stored in config map, it is vulnerable to modifications.

  2. NEG and BackendService in NEG mode. The naming scheme is as follows:

k8s1-{short cluster uid}-{service namespace}-{service name}-{service port}-{hash}

{short cluster uid} is the 8 char prefix of the cluster uid

Naming collision may happen if short cluster uid, namespace, name and port are all the same across clusters. This is mostly due to the problem #2 when user modified the cluster uid and make the short cluster uid (8 character prefix of the cluster uid) the same.

  1. Firewalls, Instance Group and BackendService in IG mode
    These resources does not suffer the naming collision problem. This is mostly because they are short enough to avoid truncation. Their naming scheme are as follows:

Firewall

k8s-fw-l7-{cluster-uid}

Instance Group

k8s-ig-{cluster-uid}

Backend Service:

k8s-be-{node port}-{cluster-uid}

Requirements for fixes

  • Uniqueness
    Resource naming scheme should guarantee uniqueness across clusters.
  • Backward compatibility
    A. Keep LB in current state intact. Since there is no way to perform hitless migration for LBs, the controller has to keep supporting LBs configured using old naming scheme.
    B. GC need to handle all naming schemes to avoid resource leak.
    C. Test dimension multiplication. Existing e2e tests will have to run across different naming schemes as well as controller upgrades.

Standing plan

Fixing problem #1:
Introduce new naming scheme to guarantee uniqueness while keeping existing LBs in tact. Fix GC for both cases.

Fixing problem #2 & #3 :
Fixing #2 can fix #3 along the way. The current plan is to use uid of kube-system namespace to replace the cluster uid config map. This will avoid user modification and collision. At the same time keep supporting clusters running in the old mode (cluster uid in configmap).

@karlkfi
Copy link

karlkfi commented Nov 16, 2019

So @skmatti,
With the new naming pattern, what's the truncation strategy and how short do namespace and ingress names need to be to avoid overlap?

k8s{version-id}-{resource-prefix}-{kube-system-uid}-{namespace}-{name}-{8-char-hash}
@skmatti
Copy link
Contributor

skmatti commented Nov 16, 2019

There is no restriction on lengths of namespace and name. Namespace and Name are trimmed evenly until their combined length is 36 in case of overflows.

@karlkfi
Copy link

karlkfi commented Nov 16, 2019

The 8-char-hash is a hash of the full un-trimmed name, right?
Otherwise we still have a possible collision problem.

@skmatti
Copy link
Contributor

skmatti commented Nov 16, 2019

Yes, that's right.

@bowei
Copy link
Member

bowei commented Nov 20, 2019

The trimmed parts of the name are for human consumption only. The hash should be always unique (modulo actual hash collisions, which should be a low probability event).

@karlkfi
Copy link

karlkfi commented Nov 20, 2019

Perfect. That’s what we needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
7 participants