Trained RL Model - Environment with Colliders
Robot Arms typically use Inverse Kinematics to calculate and move the robot arm to a desired location. Although, it offers the potential for path planning for 3D printing with a 6-axis robot arm, it only allows to check for collisions of the robot with itself.  
Using Machine Learning strategies especially Reinforcement Learning enables to train the robot arm Inverse Kinematics while avoiding collisions with itself, surrounding unknown environment, ground plane etc. It ensures the robot extruder reaches the plane along the print path from a safe position and orientation. It also makes the process adaptable to any unknown scenarios. Once trained in a varying environment the trained agent model can act accordingly in new environments.
Reinforcement Learning Task
The RL task is to make the Agent (robot arm) end effector move from Plane A to Plane B while avoiding the obstacles along the path. The agent is trained along paths with varying obstacles in order to train the agent for any unknown environment that the agent my encounter.
Reinforcement Learning Framework
The RL framework is based on the Markov Decision Process (MDP). MPD is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. (Wikipedia)
For a RL problem the MDP works in the following manner:
1. An agent sees a state in an environment
2. Agent takes an action
3. Agent receives a reward for the action
4. Repeat #1
Strategies of Initializing RL Agent and Environment
The environment and the goal for the Agent(robot arm) is changed in every episode to train the robot for different scenarios which ensures the agent learns the problem as a whole and is adaptable for any environment.
Strategies of Encoding RL Actions
The actions for the agent are the rotations of the 6 robot axis. The action space for the agent is continuous where each robot axis can rotate between -180 to 180 degrees. Every episode the agent chooses a value of rotation for the 6 axis depending on the current state and the reward from the previous episode.
Strategies of Encoding RL Rewards
A set of rewards are used to guide agent actions. Various positive rewards are given to the agent to motivate the agent or to increase the probability of the current action and negative rewards are given to demotivate the agent  or to decrease the probability of the current action.
Reinforcement Learning Training - Environment with Colliders
Reinforcement Learning Training
The agent was trained for a total of 6 million steps. It was made sure the agent explores action space before converging to an optimal policy for the task. It ensures the agent does not get stuck in a local optima and converge on a sub-optimal policy.
 After training  the model learns the policy for the task and thus exploits  the knowledge to only take the optimal actions.
RL Training Statistics
The following graphs depict the statistics for the various factors of the RL algorithm. The increase in cumulative reward and decrease in value loss over time depicts the success of the process. The agent learnt the optimal actions over time thereby maximizing the rewards and minimizing the value loss between the actual value of the states and the values predicted by the Neural Network.
Trained RL Model
Trained RL Model - Environment with Colliders
Back to Top