Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

remove method GraphManager.emulate_act_on_trainer #178

Open
zach-nervana opened this issue Jan 3, 2019 · 4 comments
Open

remove method GraphManager.emulate_act_on_trainer #178

zach-nervana opened this issue Jan 3, 2019 · 4 comments
Assignees
Labels
priority/p1 broken basics or large value add enhancements (highest priority)
Projects

Comments

@zach-nervana
Copy link
Contributor

zach-nervana commented Jan 3, 2019

A comment from the code:

# TODO-remove - this is a temporary flow, used by the trainer worker, duplicated from observe() - need to create
#               an external trainer flow reusing the existing flow and methods [e.g. observe(), step(), act()]

It appears that this code is related to, if not the cause of, quite a bit of complexity and duplicate code in the training control flow and steps counting.

@scttl scttl added this to To do in Coach Dev Jan 10, 2019
@galnov galnov moved this from To do to P2 in Coach Dev Jan 13, 2019
@scttl scttl added the priority/p2 questions needing answered or medium impact bugs label Jan 16, 2019
@scttl scttl moved this from P2 to Groomed but Not Started in Coach Dev Jan 16, 2019
@galnov galnov added priority/p1 broken basics or large value add enhancements (highest priority) and removed priority/p2 questions needing answered or medium impact bugs labels Jan 16, 2019
@balajismaniam
Copy link
Contributor

There are some subtle differences between the emulate_act_on_trainer() and act(). Same goes for emulate_act_on_observe() and observe(). We can't remove it completely.

We could try to extract out the common code between these funcs. However, it might become a set of common funcs that are too smal (~1-5 lines of code in each). @zach-nervana is that what you want?

@zach-nervana
Copy link
Contributor Author

I agree that cleaning this up is not going to be as simple as extracting duplicate code as functions.

Do you know what the original intent of this comment is?

@zach-nervana
Copy link
Contributor Author

Notes to myself/possible methods of refactoring

  • Agent.observe and Agent.emulate_observe_on_trainer could be split into 2 functions: observe(env_respons) -> transition and record(transition)
    • observe would call record
  • Agent.act and Agent.emulate_act_on_trainer can be similarly refactored
  • Agent.emulate_act_on_trainer can likely be reduced to self.total_steps_counter += 1; self.current_episode_steps_counter += 1
  • current bug?: emulated rewards during emulation do not distinguish between shaped rewards and unshaved rewards
  • use TotalStepsCounter in Agent instead of three separately named properties: self.training_iteration, self.total_steps_counter, self.current_episode
    • this will simplify space methods _should_update, _should_train and _should_update_online_weights_to_target
@gal-leibovich
Copy link
Contributor

I agree that cleaning this up is not going to be as simple as extracting duplicate code as functions.

Do you know what the original intent of this comment is?

Yeah, the comment as written is a bit confusing. As far as I remember, the original intent was to just reduce code duplication as much as possible (not just reuse existing methods as-is), as there are common shared parts in both flows, which should be reused.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/p1 broken basics or large value add enhancements (highest priority)
6 participants