-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postscript remoteshell loops for a long time #7430
Comments
I'm seeing the same/similar issue. Trying to debug it now. For me it looks like an issue with getcredentials.awk, but not getting much debug info. For now, I have edited the remoteshell script and set MAX_RETRIES from 10 to 1. This decreases the time to a more reasonable amount. Not sure if you've come up with a different work around or if you've figured out what's going on. |
Would you the output of the following to the initial post, for additional info:
|
Thank you for coming back to us, here is the requested output:
|
This fault is probably on my site, but I cannot figure out, what the reason is. Hopefully you guys can point me in the right direction: When a node boots it eventually reaches the state "xcat.deployment.postbootscript: postbootscript start..: remoteshell". Running this script it falls into a loop because it cannot retrieve the ssh keys from the xcat server. I enabled xcatdebug to shed some light on what is going on. This is what I see in /var/log/xcat/xcat.log on the client compute node:
Meanwhile the server logs:
Searching the web I found, that is command is, what is being run by the remoteshell script, unfortunately running it manually gives an empty result
While the loops runs, I can check /tmp directory, the keyfile is there, but empty (probably because an empty data was redirect to that file):
So I tried to predeploy the keys via syncfiles into the image, this worked, because I can ssh into the node while it boots, but the loop still persists, so I guess the problem is not the key itself. The correct keys are in fact still there when the node finally finished booting, I guess this is because of a final "syncfile" process at boottime overwriting the fresh generated keys due to the failing remoteshell script.
What additional information could I provide to help fixing this issue?
Thank you in advance!
The text was updated successfully, but these errors were encountered: