-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation error with PAF heads: ValueError: matrix contains invalid numeric entries
#2631
Closed
2 tasks done
Comments
Looks like an evaluation error, not training per se; your config also seems to have a mix of Hrnet and dlcrnet; can you let us know if you modified this file? |
Sorry, I pasted the wrong content of the pytorch_config.yaml file. I have edited the issue again. |
@BBrianZhang thanks for reporting this! I'll look into it |
2 tasks
Merged
n-poulsen
added a commit
that referenced
this issue
Jul 19, 2024
- Moved all metric computation code to a deeplabcut/core/metrics folder (as metrics are computed with numpy) - Cleaned metric computation code so the prediction/ground truth matching always happens - Refactored in a way such that no OOM errors should occur, even on very large datasets (>60k images) - Multi-animal RMSE: only compute RMSE using (ground-truth, detection) matches with non-zero RMSE - Add compute_detection_rmse to compute "detection" RMSE, matching the DeepLabCut 2.X implementation - Fixed the bug for PAF models documented in #2631
Should have been fixed in #2679 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue for this?
Bug description
Hello,
while using Deeplabcut with a Pytorch engine, I encountered an issue with the model dlcrnet_stride16_ms5. During the training process, a ValueError occurred stating "matrix contains invalid numeric entries." No matter how much I reduce the learning rate or adjust the training batch size, I cannot resolve this problem.
Operating System
Ubuntu 18.04
DeepLabCut version
DLC 3.0.0rc1
DeepLabCut mode
multi animal
Device type
gpu
Steps To Reproduce
1.creating a training dataset
2.pytorch_config.yaml as follows
3.train_network
Relevant log output
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: