exarl.agents.agent_vault.ddpg

Module Contents

Classes

DDPG

Deep deterministic policy gradient agent.

Functions

update_target(target_weights, weights, tau)

Attributes

logger

exarl.agents.agent_vault.ddpg.logger
exarl.agents.agent_vault.ddpg.update_target(target_weights, weights, tau)
class exarl.agents.agent_vault.ddpg.DDPG(env, is_learner)

Bases: exarl.ExaAgent

Deep deterministic policy gradient agent. Inherits from ExaAgent base class.

DDPG constructor

Parameters
  • env (OpenAI Gym environment object) – env object indicates the RL environment

  • is_learner (bool) – Used to indicate if the agent is a learner or an actor

is_learner :bool
remember(self, state, action, reward, next_state, done)

Add experience to replay buffer

Parameters
  • state (list or array) – Current state of the system

  • action (list or array) – Action to take

  • reward (list or array) – Environment reward

  • next_state (list or array) – Next state of the system

  • done (bool) – Indicates episode completion

update_grad(self, state_batch, action_batch, reward_batch, next_state_batch)

Update gradients - training step

Parameters
  • state_batch (list) – list of states

  • action_batch (list) – list of actions

  • reward_batch (list) – list of rewards

  • next_state_batch (list) – list of next states

get_actor(self)

Define actor network

Returns

model – actor model

get_critic(self)

Define critic network

Returns

model – critic network

has_data(self)

Indicates if the buffer has data

Returns

bool – True if buffer has data

generate_data(self)

Generate data for training

Yields

state_batch (list) – list of states action_batch (list): list of actions reward_batch (list): list of rewards next_state_batch (list): list of next states

train(self, batch)

Train the NN

Parameters

batch (list) – sampled batch of experiences

target_train(self)

Update target model

action(self, state)

Returns sampled action with added noise

Parameters

state (list or array) – Current state of the system

Returns

action (list or array) – Action to take policy (int): random (0) or inference (1)

get_weights(self)

Get weights from target model

Returns

weights (list) – target model weights

set_weights(self, weights)

Set model weights

Parameters

weights (list) – model weights

update(self)
load(self, filename)

Load model weights from pickle file

Parameters

filename (string) – full path of model file

save(self, filename)

Save model weights to pickle file

Parameters

filename (string) – full path of model file

monitor(self)
set_agent(self)
epsilon_adj(self)

Update epsilon value