Skip to content

Multi-GPU training on same machine is getting stuck #378

Discussion options

You must be logged in to vote

Based on your results with train.py in torchvision, I think the problem is caused by your (docker) environment, and I do not have the right answer for this.

torchdistill no longer supports amp because it supports Hugging Face accelerate instead. See #247

Replies: 1 comment 24 replies

Comment options

You must be logged in to vote
24 replies
@nighting0le01
Comment options

@yoshitomo-matsubara
Comment options

@nighting0le01
Comment options

@yoshitomo-matsubara
Comment options

Answer selected by nighting0le01
@nighting0le01
Comment options

@yoshitomo-matsubara
Comment options

@nighting0le01
Comment options

@yoshitomo-matsubara
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants