Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where does the experiment data get saved to? #48

Open
PorkPy opened this issue Mar 12, 2019 · 5 comments
Open

Where does the experiment data get saved to? #48

PorkPy opened this issue Mar 12, 2019 · 5 comments

Comments

@PorkPy
Copy link

PorkPy commented Mar 12, 2019

Sorry for the silly question but,
Can you please tell me where the experiment results are saved to?
Once the policy has been trained where is that policy saved to, if at all? Is it only the weights that are saved?

Thanks.

@nily-dti
Copy link

@PorkPy that’s not a silly question. In my research group we’re attempting to reproduce Kindred’s work on the UR5, as you are. To my knowledge neither the experiment data (reward, joint positions, etc) nor the weights/model is saved anywhere in the publicised library code (this repo).

Have you managed to figure something out?

@PorkPy
Copy link
Author

PorkPy commented May 10, 2019

Hi @nily-dti
I managed to find the pre-trained models https://github.com/kindredresearch/SenseAct/tree/load_model/examples/advanced/pre_trained_models
I had this working ok on both a ur5 and a ur10, problem was though that I had to downgrade the ur software to get it to work, despite someone making a pull request for a later software version. The downgraded software meant I couldn’t communicate with the robotic gripper or F/T sensor through the robot so I had to communicate via ROS.
When I last looked, I think I found out that one should use ‘pickle’ to save the weight matrix and perhaps other things like (s, a, s’) if your using a replay buffer. I never really got to the bottom of how a replay buffer is initiated/updated. I need one to seed a policy with some kinaesthetic demonstrations.
My cunning plan now is to just use the ur_modern_driver in ros which has a urscript api.
I can import tensorflow into a ros node and send joint command directly to the robot.
:)

@gauthamvasan
Copy link
Collaborator

gauthamvasan commented May 15, 2019

Hi folks, the experiment data isn't saved anywhere locally in the examples listed. This was a deliberate choice. Adding logging and saving made the example scripts complicated and messy. We figured that different experiments would need different data logging and it should be up to the person running the experiment to do it.

Since we're using the OpenAI Baselines implementation for different algorithms, we use their exposed callback function to obtain experiment data. For example, we have a kindred_callback in the UR reacher example.

For example, we've defined kindred_callback as follows here. This already obtains the returns and episode lengths and sends it to the plotter. You could modify it to obtain other policy information, save the tensorflow session/model, etc.

Hope this helps!

@nily-dti
Copy link

Hi @gauthamvasan I've heard that you couldn't share your experimental code due to legal concerns, which would be an acceptable answer. Maybe that's what you refer to by "deliberate choice"? But does logging really make it look that 'messy'? And is that a valid statement for not sharing code? I don't think so...

To correct you: You are only using OpenAI baselines for TRPO and PPO. For DDPG you're using the rllab implementation and Soft-Q the original author's. Both of these implementations do not (as OpenAI) provide the option for having a callback, so maybe you are able to share where you have made changes to these implementations, just to ease the work of us trying to use it?

BTW. It's not clear which version of PPO you're using from OpenAI Baselines (PPO1 or PPO2?)

@gauthamvasan
Copy link
Collaborator

Hi @nily-dti I should've phrased it better. Sorry about that. Deliberate choice referred to legal concerns, keeping it simple and easy to understand the agent, env interaction in the code, etc. Logging and saving in general are good things. But we wanted to work with an off-the-shelf implementation of these algorithms. Saving episode data, models and other experiment relevant logging is hard to do without changing the original author's code.

We do use rllab DDPG and Haarnoja's Soft-Q in the paper. But only baselines TRPO and PPO are used in our examples in this repo. SenseAct provides implementation of real-world RL tasks with an OpenAI Gym style interface. The examples highlight the fact that you could just plug in your env with Baselines and you're good to go. I guess it'd be useful to have some examples soft-Q and DDPG. Will keep you posted when it happens.

We're using PPO1. Imports ppo1 in the code right here. TRPO is used in our robot examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants