Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

Exploration access to environment for forward simulation #237

Open
redknightlois opened this issue Mar 4, 2019 · 4 comments
Open

Exploration access to environment for forward simulation #237

redknightlois opened this issue Mar 4, 2019 · 4 comments
Labels
priority/p1 broken basics or large value add enhancements (highest priority)
Projects

Comments

@redknightlois
Copy link
Contributor

redknightlois commented Mar 4, 2019

Hi,

I stumbled upon the following potential improvement, I am hacking it right now, but it would be great to have a proper solution. MCTS and other forward simulation techniques must have access to clones of the environment to execute rollouts. There is no way to pass the Exploration Policies the actual instantiated environment so they can perform the forward search.

For the purpose of illustration, this is the hack:

graph_manager.verify_graph_was_created()
env = graph_manager.environments[0]
graph_manager.top_level_manager.agents['agent'].exploration_policy.set_environment(env)

Being able to pass the instantiated environment as suggested in #212 would be a potential workaround although not a solution.

@gal-leibovich
Copy link
Contributor

@galnov galnov added this to Requires Grooming in Coach Dev via automation Mar 4, 2019
@galnov galnov added the priority/p1 broken basics or large value add enhancements (highest priority) label Mar 4, 2019
@gal-leibovich
Copy link
Contributor

We have purposefully encapsulated the environment and have hidden it from the agent. All the interaction between the two is managed through the level manager. The goal was to allow for more complex scenarios than standard RL, such as Hierarchical Reinforcement Learning, self-play, multi-agent RL, etc.

@guyk1971 is also looking into adding MCTS support to Coach. He might have more insights to share here. If we can limit the agent's access to the environment, that might be preferred (from SW encapsulation perspective, and in order to increase the framework robustness).

I think that in #212, passing the instantiated environment is referring to initializing the environment outside of Coach. So it still wouldn't be available to the agent or to the exploration policy. But, I might be wrong.

@redknightlois
Copy link
Contributor Author

redknightlois commented Mar 22, 2019

@galleibo-intel @guyk1971 I stumbled across the Go-Explore paper (https://arxiv.org/pdf/1901.10995.pdf) you should seriously take a look into it in the context of supporting entirely different scenarios. The workflow is so different to anything that is available on Couch that if you devise a way to make that work, all the rest are going to be pretty easy to implement on top of it.

EDIT: In the same direction the POET paper can give some other hints of operators on training workflows. https://arxiv.org/abs/1901.01753

@gal-leibovich
Copy link
Contributor

Thanks @redknightlois. At the moment we do not have plans for big scale architectural framework changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/p1 broken basics or large value add enhancements (highest priority)
3 participants