Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph4rec在训练的时候,内存缓慢上涨,最终超限崩了 #441

Open
zouhan6806504 opened this issue Jul 29, 2022 · 2 comments
Open
Assignees

Comments

@zouhan6806504
Copy link

zouhan6806504 commented Jul 29, 2022

aistudio 32g gpu 32g内存
paddle2.2.2 pgl2.1.5

我的图约1亿条边,1kw节点
最开始我将节点归集到2类(原始数据7类),这样生成数据训练没什么大问题。但是后��我觉得这么分类可能会丢失信息,于是就按原始7分类生成训练数据,metapath也根据实际可能存在的路径生成了8种,这时候再训练就出现问题了,内存慢慢增加,最终超过32g程序崩掉
我尝试降低walk_len、walk_num、batch_node_size都不行
按理说这两种训练方式数据量都一致,就是游走的时候不同,第一种情况内存消耗稳定在17g,可能的问题出在哪里?

@Yelrose
Copy link
Collaborator

Yelrose commented Jul 29, 2022 via email

@zouhan6806504
Copy link
Author

zouhan6806504 commented Jul 30, 2022

内存与节点数目和向量维度有关,建议把embedding size缩小

embedding size缩了一半到64也还是一样,我疑惑的是两种方法训练数据量是一样的,第一种就能平稳运行,第二种内存消耗缓慢上涨

@Yelrose Yelrose self-assigned this Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants