Multi-GPU training on same machine is getting stuck #378
Answered
by
yoshitomo-matsubara
nighting0le01
asked this question in
Q&A
-
Beta Was this translation helpful? Give feedback.
Answered by
yoshitomo-matsubara
Jul 26, 2023
Replies: 1 comment 24 replies
-
It seems you're using your own scripts ( Please do not use screenshots to show the logs or files, but use texts for better search experience to help others find this discussion when they face similar issues. Also, please complete other discussions you opened at first and do not leave them unattended. Respect my time for OSS as well. |
Beta Was this translation helpful? Give feedback.
24 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on your results with train.py in torchvision, I think the problem is caused by your (docker) environment, and I do not have the right answer for this.
torchdistill no longer supports amp because it supports Hugging Face accelerate instead. See #247