exarl.agents.agent_vault.dqn

Module Contents

Classes

LossHistory

Loss history for training

DQN

Multi-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay

Attributes

multiLearner

logger

exarl.agents.agent_vault.dqn.multiLearner = True
exarl.agents.agent_vault.dqn.logger
class exarl.agents.agent_vault.dqn.LossHistory

Bases: tensorflow.keras.callbacks.Callback

Loss history for training

on_train_begin(self, logs={})
on_batch_end(self, batch, logs={})
class exarl.agents.agent_vault.dqn.DQN(env, is_learner)

Bases: exarl.ExaAgent

Multi-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay

DQN Constructor

Parameters
  • env (OpenAI Gym environment object) – env object indicates the RL environment

  • is_learner (bool) – Used to indicate if the agent is a learner or an actor

_get_device(self)

Get device type (CPU/GPU)

Returns

string – device type

_build_model(self)

Build NN model based on parameters provided in the config file

Returns

[type] – [description]

set_learner(self)
remember(self, state, action, reward, next_state, done)

Add experience to replay buffer

Parameters
  • state (list or array) – Current state of the system

  • action (list or array) – Action to take

  • reward (list or array) – Environment reward

  • next_state (list or array) – Next state of the system

  • done (bool) – Indicates episode completion

get_action(self, state)

Use epsilon-greedy approach to generate actions

Parameters

state (list or array) – Current state of the system

Returns

(list or array) – Action to take

action(self, state)

Discretizes 1D continuous actions to work with DQN

Parameters

state (list or array) – Current state of the system

Returns

action (list or array) – Action to take policy (int): random (0) or inference (1)

calc_target_f(self, exp)

Bellman equation calculations

Parameters

exp (list of experience) – contains state, action, reward, next state, done

Returns

target Q value (array) – [description]

has_data(self)

Indicates if the buffer has data of size batch_size or more

Returns

bool – True if replay_buffer length >= self.batch_size

generate_data(self)

Unpack and yield training data

Yields

batch_states (numpy array) – training input batch_target (numpy array): training labels With PER:

indices (numpy array): data indices importance (numpy array): importance weights

train(self, batch)

Train the NN

Parameters

batch (list) – sampled batch of experiences

Returns

if PER – indices (numpy array): data indices

loss: training loss

else:

None

training_step(self, batch)

Training step for multi-learner using Horovod

Parameters

batch (list) – sampled batch of experiences

Returns

loss_value – loss value per training step for multi-learner

set_priorities(self, indices, loss)

Set priorities for training data

Parameters
  • indices (array) – data indices

  • loss (array) – Losses

get_weights(self)

Get weights from target model

Returns

weights (list) – target model weights

set_weights(self, weights)

Set model weights

Parameters

weights (list) – model weights

target_train(self)

Update target model

epsilon_adj(self)

Update epsilon value

load(self, filename)

Load model weights from pickle file

Parameters

filename (string) – full path of model file

save(self, filename)

Save model weights to pickle file

Parameters

filename (string) – full path of model file

update(self)
monitor(self)