Deep Reinforcement learning

Reinforcement learning (RL) is a framework where agents learn to perform actions in an environment so as to maximize a reward. It’s actually training an AI to learn through every mistake and find the correct path without any label. The two main components are the environment and the agent.

Deep Reinforcement learning (DRL) combined with deep learning technology is even more powerful. AlphaGo, is a typical application of deep reinforcement learning.

source : http://incompleteideas.net/book/bookdraft2017nov5.pdf

It is composed of agent/actions/(status/rewards)/environment.

Reinforcement learning builds the agent and environment continuously interact with each other. After each action, the agent will receive the reward Rt+1 and the next state St+1. The goal is to improve the policy so as to maximize the sum of rewards (return).

Deep Reinforcement learning (DRL) ?

In DRL. A table(Q-table) will be stored to record all actions executed in a specific state and the value generated. Through this table, you can find the best execution method. The design of Q-Table is transformed into a neural network for learning. Through neural network learning, with different layers, huge features can be extracted from the environment to learn.

Gym MountainCar

There is the MountainCar game environment in the Gym environment library launched by OpenAI.

# show game
import gym
from gym import wrappers

env = gym.make('MountainCar-v0')
print(env.action_space.n)
print(env.observation_space)
print(env.observation_space.high)
print(env.observation_space.low)
env = wrappers.Monitor(env, "./gym-results", force=True)
env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done: break
env.close()

Actions:
    Type: Discrete(3)
    Num    Action
    0      Accelerate to the Left
    1      Don't accelerate
    2      Accelerate to the Right

Observation:
    Type: Box(2)
    Num    Observation               Min            Max
    0      Car Position              -1.2           0.6
    1      Car Velocity              -0.07          0.07

we use a simpler fully connected neural network.

class DQNetwork(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.dense1 = tf.keras.layers.Dense(units=64, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(units=16, activation=tf.nn.relu)
        self.dense3 = tf.keras.layers.Dense(units=active_n)

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.dense2(x)
        x = self.dense3(x)
        return x

    def predict(self, inputs):
        q_values = self(inputs)
        return tf.argmax(q_values, axis=-1)

implement Q learning