Where does the experiment data get saved to? #48

PorkPy · 2019-03-12T12:26:31Z

Sorry for the silly question but,
Can you please tell me where the experiment results are saved to?
Once the policy has been trained where is that policy saved to, if at all? Is it only the weights that are saved?

Thanks.

nily-dti · 2019-05-10T00:56:33Z

@PorkPy that’s not a silly question. In my research group we’re attempting to reproduce Kindred’s work on the UR5, as you are. To my knowledge neither the experiment data (reward, joint positions, etc) nor the weights/model is saved anywhere in the publicised library code (this repo).

Have you managed to figure something out?

PorkPy · 2019-05-10T08:34:41Z

Hi @nily-dti
I managed to find the pre-trained models https://github.com/kindredresearch/SenseAct/tree/load_model/examples/advanced/pre_trained_models
I had this working ok on both a ur5 and a ur10, problem was though that I had to downgrade the ur software to get it to work, despite someone making a pull request for a later software version. The downgraded software meant I couldn’t communicate with the robotic gripper or F/T sensor through the robot so I had to communicate via ROS.
When I last looked, I think I found out that one should use ‘pickle’ to save the weight matrix and perhaps other things like (s, a, s’) if your using a replay buffer. I never really got to the bottom of how a replay buffer is initiated/updated. I need one to seed a policy with some kinaesthetic demonstrations.
My cunning plan now is to just use the ur_modern_driver in ros which has a urscript api.
I can import tensorflow into a ros node and send joint command directly to the robot.
:)

gauthamvasan · 2019-05-15T22:01:52Z

Hi folks, the experiment data isn't saved anywhere locally in the examples listed. This was a deliberate choice. Adding logging and saving made the example scripts complicated and messy. We figured that different experiments would need different data logging and it should be up to the person running the experiment to do it.

Since we're using the OpenAI Baselines implementation for different algorithms, we use their exposed callback function to obtain experiment data. For example, we have a kindred_callback in the UR reacher example.

For example, we've defined kindred_callback as follows here. This already obtains the returns and episode lengths and sends it to the plotter. You could modify it to obtain other policy information, save the tensorflow session/model, etc.

Hope this helps!

nily-dti · 2019-05-16T08:30:15Z

Hi @gauthamvasan I've heard that you couldn't share your experimental code due to legal concerns, which would be an acceptable answer. Maybe that's what you refer to by "deliberate choice"? But does logging really make it look that 'messy'? And is that a valid statement for not sharing code? I don't think so...

To correct you: You are only using OpenAI baselines for TRPO and PPO. For DDPG you're using the rllab implementation and Soft-Q the original author's. Both of these implementations do not (as OpenAI) provide the option for having a callback, so maybe you are able to share where you have made changes to these implementations, just to ease the work of us trying to use it?

BTW. It's not clear which version of PPO you're using from OpenAI Baselines (PPO1 or PPO2?)

gauthamvasan · 2019-05-16T13:04:54Z

Hi @nily-dti I should've phrased it better. Sorry about that. Deliberate choice referred to legal concerns, keeping it simple and easy to understand the agent, env interaction in the code, etc. Logging and saving in general are good things. But we wanted to work with an off-the-shelf implementation of these algorithms. Saving episode data, models and other experiment relevant logging is hard to do without changing the original author's code.

We do use rllab DDPG and Haarnoja's Soft-Q in the paper. But only baselines TRPO and PPO are used in our examples in this repo. SenseAct provides implementation of real-world RL tasks with an OpenAI Gym style interface. The examples highlight the fact that you could just plug in your env with Baselines and you're good to go. I guess it'd be useful to have some examples soft-Q and DDPG. Will keep you posted when it happens.

We're using PPO1. Imports ppo1 in the code right here. TRPO is used in our robot examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where does the experiment data get saved to? #48

Where does the experiment data get saved to? #48

PorkPy commented Mar 12, 2019

nily-dti commented May 10, 2019

PorkPy commented May 10, 2019

gauthamvasan commented May 15, 2019 •

edited

Loading

nily-dti commented May 16, 2019

gauthamvasan commented May 16, 2019

Where does the experiment data get saved to? #48

Where does the experiment data get saved to? #48

Comments

PorkPy commented Mar 12, 2019

nily-dti commented May 10, 2019

PorkPy commented May 10, 2019

gauthamvasan commented May 15, 2019 • edited Loading

nily-dti commented May 16, 2019

gauthamvasan commented May 16, 2019

gauthamvasan commented May 15, 2019 •

edited

Loading