`exarl.agents.agent_vault.dqn`

Module Contents

Classes

`LossHistory`	Loss history for training
`DQN`	Multi-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay

Attributes

`multiLearner`
`logger`

exarl.agents.agent_vault.dqn.multiLearner = True

exarl.agents.agent_vault.dqn.logger

class exarl.agents.agent_vault.dqn.LossHistory

Bases: tensorflow.keras.callbacks.Callback

Loss history for training

on_train_begin(self, logs={})

on_batch_end(self, batch, logs={})

class exarl.agents.agent_vault.dqn.DQN(env, is_learner)

Bases: exarl.ExaAgent

Multi-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay

DQN Constructor

Parameters

env (OpenAI Gym environment object) – env object indicates the RL environment
is_learner (bool) – Used to indicate if the agent is a learner or an actor

_get_device(self)

Get device type (CPU/GPU)

Returns: string – device type

_build_model(self)

Build NN model based on parameters provided in the config file

Returns: [type] – [description]

set_learner(self)

remember(self, state, action, reward, next_state, done)

Add experience to replay buffer

Parameters

state (list or array) – Current state of the system
action (list or array) – Action to take
reward (list or array) – Environment reward
next_state (list or array) – Next state of the system
done (bool) – Indicates episode completion

get_action(self, state)

Use epsilon-greedy approach to generate actions

Parameters: state (list or array) – Current state of the system
Returns: (list or array) – Action to take

action(self, state)

Discretizes 1D continuous actions to work with DQN

Parameters: state (list or array) – Current state of the system
Returns: action (list or array) – Action to take policy (int): random (0) or inference (1)

calc_target_f(self, exp)

Bellman equation calculations

Parameters: exp (list of experience) – contains state, action, reward, next state, done
Returns: target Q value (array) – [description]

has_data(self)

Indicates if the buffer has data of size batch_size or more

Returns: bool – True if replay_buffer length >= self.batch_size

generate_data(self)

Unpack and yield training data

Yields: batch_states (numpy array) – training input batch_target (numpy array): training labels With PER:

indices (numpy array): data indices importance (numpy array): importance weights

train(self, batch)

Train the NN

Parameters

batch (list) – sampled batch of experiences

Returns

if PER – indices (numpy array): data indices: loss: training loss
else:: None

training_step(self, batch)

Training step for multi-learner using Horovod

Parameters: batch (list) – sampled batch of experiences
Returns: loss_value – loss value per training step for multi-learner

set_priorities(self, indices, loss)

Set priorities for training data

Parameters

indices (array) – data indices
loss (array) – Losses

get_weights(self)

Get weights from target model

Returns: weights (list) – target model weights

set_weights(self, weights)

Set model weights

Parameters: weights (list) – model weights

target_train(self): Update target model

epsilon_adj(self): Update epsilon value

load(self, filename)

Load model weights from pickle file

Parameters: filename (string) – full path of model file

save(self, filename)

Save model weights to pickle file

Parameters: filename (string) – full path of model file

update(self)

monitor(self)

exarl.agents.agent_vault.dqn

Module Contents

Classes

Attributes

`exarl.agents.agent_vault.dqn`