Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL benchmark error #352

Open
ultrons opened this issue Jan 3, 2024 · 0 comments · May be fixed by #353
Open

NCCL benchmark error #352

ultrons opened this issue Jan 3, 2024 · 0 comments · May be fixed by #353

Comments

@ultrons
Copy link

ultrons commented Jan 3, 2024

The command snippet for GKE cluster creation as specified in a3/README.md is missing user's own credential mount (which is correctly done in the main readme for MIG example).
It's also missing working GKE version specification.
NCCL benchmark example as well as LIT GPT fail with cryptic errors until I mentioned the GKE version = "1.27.8-gke.1067000"
(Thanks to the recommendation from @samuelkarp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant