exarl.agents.agent_vault.dqn
Module Contents
Classes
Loss history for training |
|
Multi-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay |
Attributes
- exarl.agents.agent_vault.dqn.multiLearner = True
- exarl.agents.agent_vault.dqn.logger
- class exarl.agents.agent_vault.dqn.LossHistory
Bases:
tensorflow.keras.callbacks.CallbackLoss history for training
- on_train_begin(self, logs={})
- on_batch_end(self, batch, logs={})
- class exarl.agents.agent_vault.dqn.DQN(env, is_learner)
Bases:
exarl.ExaAgentMulti-Learner Discrete Double Deep Q-Network with Prioritized Experience Replay
DQN Constructor
- Parameters
env (OpenAI Gym environment object) – env object indicates the RL environment
is_learner (bool) – Used to indicate if the agent is a learner or an actor
- _get_device(self)
Get device type (CPU/GPU)
- Returns
string – device type
- _build_model(self)
Build NN model based on parameters provided in the config file
- Returns
[type] – [description]
- set_learner(self)
- remember(self, state, action, reward, next_state, done)
Add experience to replay buffer
- Parameters
state (list or array) – Current state of the system
action (list or array) – Action to take
reward (list or array) – Environment reward
next_state (list or array) – Next state of the system
done (bool) – Indicates episode completion
- get_action(self, state)
Use epsilon-greedy approach to generate actions
- Parameters
state (list or array) – Current state of the system
- Returns
(list or array) – Action to take
- action(self, state)
Discretizes 1D continuous actions to work with DQN
- Parameters
state (list or array) – Current state of the system
- Returns
action (list or array) – Action to take policy (int): random (0) or inference (1)
- calc_target_f(self, exp)
Bellman equation calculations
- Parameters
exp (list of experience) – contains state, action, reward, next state, done
- Returns
target Q value (array) – [description]
- has_data(self)
Indicates if the buffer has data of size batch_size or more
- Returns
bool – True if replay_buffer length >= self.batch_size
- generate_data(self)
Unpack and yield training data
- Yields
batch_states (numpy array) – training input batch_target (numpy array): training labels With PER:
indices (numpy array): data indices importance (numpy array): importance weights
- train(self, batch)
Train the NN
- Parameters
batch (list) – sampled batch of experiences
- Returns
- if PER – indices (numpy array): data indices
loss: training loss
- else:
None
- training_step(self, batch)
Training step for multi-learner using Horovod
- Parameters
batch (list) – sampled batch of experiences
- Returns
loss_value – loss value per training step for multi-learner
- set_priorities(self, indices, loss)
Set priorities for training data
- Parameters
indices (array) – data indices
loss (array) – Losses
- get_weights(self)
Get weights from target model
- Returns
weights (list) – target model weights
- set_weights(self, weights)
Set model weights
- Parameters
weights (list) – model weights
- target_train(self)
Update target model
- epsilon_adj(self)
Update epsilon value
- load(self, filename)
Load model weights from pickle file
- Parameters
filename (string) – full path of model file
- save(self, filename)
Save model weights to pickle file
- Parameters
filename (string) – full path of model file
- update(self)
- monitor(self)