-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global goal point #127
Comments
Hi, This repo is for simply training a robot motion policy through DRL. There is no global exploration in this repository. Global planning and exploration is described in GDAE repository. POI generation and selection is also only part of GDAE and not this repository. |
Thanks for your prompt reply, I had read your GDAE before this and asked you questions as well. May I ask if I have Pytorch + Ubuntu 20.04 + ROS Neotic and can I run your GDAE.py? What environment would you recommend to configure for simulation and eventual generalization to a real cart, my cart is equipped with an on-board computer with only a cpu! |
2.The second question is, what do you think are the sometimes advantages of reinforcement learning compared to other autonomous exploration algorithms such as Active Slam, etc. The only thing I can think of is the ability to reduce the amount of computation on the on-board computers through an end-to-end approach, and what are the other huge advantages you see? Looking forward to your reply! |
Yes that setup would be able to run GDAM but you would have to update the code accordingly. The training is the thing that requires the most resources. The trained model is fairly lightweight so for deployment there are no heavy hardware requirements. I have deployed the GDAE on intel NUC with i3 CPU so it is possible to do that with only onboard cpu support. Not sure what you mean by environment. If you mean the simulated world, then the one used in this repo works quite well. |
|
Dear Reinis Cimurs,I'd like to continue to ask you why you add |
Hi Essentially we add it there because of inertia. We want to give the model an idea if it is moving already or not. If a robot is already moving, the same new action might bring it farther than if the robot has stopped. While it did not bring a huge benefit, experimentally it showed better performance than if we leave this information out. |
Thanks for the reply, but I still don't understand what you're saying about that:
|
Lets say your robots current velocity is 1m/s. Now you want to apply action that states 1m/s for 1 second. Since you have constant velocity, after this 1 second your robot will have moved 1 meter. On the other hand, lets assume a different scenario where your robots current velocity is 0m/s and you apply action 1m/s for 1 second. Since your robot has some weight, it will not be able to move with the velocity of 1m/s right away, there will be some ramp-up period until it reaches this speed. So after 1 second you will not have moved 1m but some measure below that. So even though the rest of the state was the same and even action is the same, you get a slightly different outcome, because the internal robot state is different, depending on if the robot is in motion or not. |
Thank you for your patience! I seem to understand what you mean, what you mean is that the robot's own state should also be input into the neural network as a state he learns, but can he really learn this, the previous comparison had |
I do not understand the question. The state is represented by laser_state + robot_state. This is what the model is trained on and used in deployment. |
Dear Reinis Cimurs,
I recently read your essay titled "Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning",I think your paper is fantastic and having watched your videos on youtube, I can't wait to implement it.
My problem now is that I want to use my cart to do real physics experiments, but I don't have in your code you find where to set the global target point. And I don't understand very well how to realize the exploration of the environment by going to a global goal point, instead of navigating from a-b at once. Could it be that the purpose of exploring as much of the environment as possible is achieved by going through the POIs generated in your code and traversing enough of them?
I am very touched to see your serious and patient replies to other questions, and I look forward to hearing from you and wish you a happy life!
The text was updated successfully, but these errors were encountered: