Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRECISION, RECALL, and F1 are always 0 when training #151

Open
WQR53 opened this issue May 4, 2022 · 1 comment
Open

PRECISION, RECALL, and F1 are always 0 when training #151

WQR53 opened this issue May 4, 2022 · 1 comment

Comments

@WQR53
Copy link

WQR53 commented May 4, 2022

I use astminer to generate C data for feeding code2vec. And the dataset is from https://github.com/intel/neuro-vectorizer . However, PRECISION, RECALL, and F1 are always zero when training. I use the command source train.sh to run train.sh and the following output was obtained (partially).

2022-05-03 01:26:07,479 INFO     After 12 epochs -- top10_acc: [0.32272727 0.51818182 0.61363636 0.66818182 0.69090909 0.73636364
 0.75454545 0.76363636 0.76818182 0.79545455], precision: 0.0, recall: 0.0, F1: 0
2022-05-03 01:26:16,707 INFO     Average loss at batch 100: 0.008121,   throughput: 399 samples/sec
2022-05-03 01:26:26,561 INFO     Saved after 13 epochs in: models/try_c_large/saved_model_iter13
2022-05-03 01:26:26,628 INFO     Starting evaluation
2022-05-03 01:26:26,943 INFO     Done evaluating, epoch reached
2022-05-03 01:26:26,944 INFO     Evaluation time: 0H:0M:0S
2022-05-03 01:26:26,944 INFO     After 13 epochs -- top10_acc: [0.39545455 0.46363636 0.53636364 0.61363636 0.67272727 0.71818182
 0.74090909 0.75454545 0.77272727 0.79090909], precision: 0.0, recall: 0.0, F1: 0
2022-05-03 01:26:45,938 INFO     Saved after 14 epochs in: models/try_c_large/saved_model_iter14
2022-05-03 01:26:46,025 INFO     Starting evaluation
2022-05-03 01:26:46,338 INFO     Done evaluating, epoch reached
2022-05-03 01:26:46,338 INFO     Evaluation time: 0H:0M:0S
2022-05-03 01:26:46,339 INFO     After 14 epochs -- top10_acc: [0.5        0.60454545 0.62272727 0.67272727 0.71363636 0.75454545
 0.76818182 0.78636364 0.79090909 0.80909091], precision: 0.0, recall: 0.0, F1: 0
2022-05-03 01:27:04,274 INFO     Saved after 15 epochs in: models/try_c_large/saved_model_iter15
2022-05-03 01:27:04,381 INFO     Starting evaluation
2022-05-03 01:27:04,731 INFO     Done evaluating, epoch reached
2022-05-03 01:27:04,733 INFO     Evaluation time: 0H:0M:0S
2022-05-03 01:27:04,734 INFO     After 15 epochs -- top10_acc: [0.45       0.58181818 0.63636364 0.7        0.72272727 0.75454545
 0.76363636 0.78181818 0.81363636 0.83181818], precision: 0.0, recall: 0.0, F1: 0

I used astminer to get path_contexts.c2s file and divided it into three files train.c2s, test.c2s and val.c2s. Next, I modified the file preprocess.sh and got 7 c2v files: xxxx.dict.c2v, xxxx.histo.ori.c2v, xxxx.histo.path.c2v, xxxx.histo.tgt.c2v, xxxx.test.c2v, xxxx.train.c2v, xxxx.val.c2v. And then I used the command source train.sh to run train.sh but found that PRECISION, RECALL, and F1 were all 0.

@urialon
Copy link
Collaborator

urialon commented May 5, 2022

Hi @WQR53 ,
Thank you for your interest in our work!

I don't know the reason, since astminer and neuro-vectorizer are not mine.

However, please check out this PolyCoder paper: https://arxiv.org/pdf/2202.13169.pdf
and code: https://github.com/VHellendoorn/Code-LMs
where we release a larger model that works for many languages.
Specifically, for C, PolyCoder achieves better results than OpenAI's Codex.

Best,
Uri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants