Exploration access to environment for forward simulation #237

redknightlois · 2019-03-04T02:18:13Z

Hi,

I stumbled upon the following potential improvement, I am hacking it right now, but it would be great to have a proper solution. MCTS and other forward simulation techniques must have access to clones of the environment to execute rollouts. There is no way to pass the Exploration Policies the actual instantiated environment so they can perform the forward search.

For the purpose of illustration, this is the hack:

graph_manager.verify_graph_was_created()
env = graph_manager.environments[0]
graph_manager.top_level_manager.agents['agent'].exploration_policy.set_environment(env)

Being able to pass the instantiated environment as suggested in #212 would be a potential workaround although not a solution.

The text was updated successfully, but these errors were encountered:

gal-leibovich · 2019-03-04T07:15:33Z

@guyk1971

gal-leibovich · 2019-03-10T11:35:00Z

We have purposefully encapsulated the environment and have hidden it from the agent. All the interaction between the two is managed through the level manager. The goal was to allow for more complex scenarios than standard RL, such as Hierarchical Reinforcement Learning, self-play, multi-agent RL, etc.

@guyk1971 is also looking into adding MCTS support to Coach. He might have more insights to share here. If we can limit the agent's access to the environment, that might be preferred (from SW encapsulation perspective, and in order to increase the framework robustness).

I think that in #212, passing the instantiated environment is referring to initializing the environment outside of Coach. So it still wouldn't be available to the agent or to the exploration policy. But, I might be wrong.

redknightlois · 2019-03-22T12:15:19Z

@galleibo-intel @guyk1971 I stumbled across the Go-Explore paper (https://arxiv.org/pdf/1901.10995.pdf) you should seriously take a look into it in the context of supporting entirely different scenarios. The workflow is so different to anything that is available on Couch that if you devise a way to make that work, all the rest are going to be pretty easy to implement on top of it.

EDIT: In the same direction the POET paper can give some other hints of operators on training workflows. https://arxiv.org/abs/1901.01753

gal-leibovich · 2019-03-24T11:17:39Z

Thanks @redknightlois. At the moment we do not have plans for big scale architectural framework changes.

galnov added this to Requires Grooming in Coach Dev via automation Mar 4, 2019

galnov added the priority/p1 broken basics or large value add enhancements (highest priority) label Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploration access to environment for forward simulation #237

Exploration access to environment for forward simulation #237

redknightlois commented Mar 4, 2019 •

edited

Loading

gal-leibovich commented Mar 4, 2019

gal-leibovich commented Mar 10, 2019

redknightlois commented Mar 22, 2019 •

edited

Loading

gal-leibovich commented Mar 24, 2019

Exploration access to environment for forward simulation #237

Exploration access to environment for forward simulation #237

Comments

redknightlois commented Mar 4, 2019 • edited Loading

gal-leibovich commented Mar 4, 2019

gal-leibovich commented Mar 10, 2019

redknightlois commented Mar 22, 2019 • edited Loading

gal-leibovich commented Mar 24, 2019

redknightlois commented Mar 4, 2019 •

edited

Loading

redknightlois commented Mar 22, 2019 •

edited

Loading