ddql_optimal_execution.agent.DDQL¶
-
class
ddql_optimal_execution.agent.
DDQL
(state_dict: Optional[dict] = None, greedy_decay_rate: float = 0.95, target_update_rate: int = 15, initial_greediness: float = 1, mode: str = 'train', lr: float = 0.001, state_size: int = 5, initial_budget: int = 100, horizon: int = 100, gamma: float = 0.99, quadratic_penalty_coefficient: float = 0.01, verbose: bool = False)[source]¶ -
The DDQL class inherits from the Agent class. It is an agent that implements a Double Deep Q-Learning algorithm.
- Parameters
state_dict (dict, optional) – A dictionary containing the state of the agent, by default None
greedy_decay_rate (float, optional) – The greedy decay rate, by default 0.95
target_update_rate (int, optional) – The target update rate, by default 15
initial_greediness (float, optional) – The initial greediness, by default 1
mode (str, optional) – The mode, by default “train”
lr (float, optional) – The learning rate, by default 1e-3
state_size (int, optional) – The state size, by default 5
initial_budget (int, optional) – The initial budget, by default 100
horizon (int, optional) – The horizon, by default 100
gamma (float, optional) – The gamma parameter used in the Q-Learning algorithm, by default 0.99
quadratic_penalty_coefficient (float, optional) – The quadratic penalty coefficient used to penalize the agent for selling big quantities of stocks, by default 0.01
-
device
¶ -
The device used to run the agent
- Type
torch.device
-
main_net
¶ -
The main neural network used to predict the Q-values of the state-action pairs
- Type
QNet
-
target_net
¶ -
The target neural network used to predict the Q-values of the state-action pairs
- Type
QNet
-
state_size
¶ -
The state size
- Type
int
-
greedy_decay_rate
¶ -
The greedy decay rate
- Type
float
-
target_update_rate
¶ -
The target update rate
- Type
int
-
initial_greediness
¶ -
The initial greediness of the agent. It is used to determine the probability of the agent taking a random action.
- Type
float
-
greediness
¶ -
The current greediness of the agent.
- Type
float
-
mode
¶ -
The mode of the agent. It can be either “train” or “test”.
- Type
str
-
lr
¶ -
The learning rate used to update the weights of the neural network.
- Type
float
-
gamma
¶ -
The gamma parameter used in the Q-Learning algorithm.
- Type
float
-
quadratic_penalty_coefficient
¶ -
The quadratic penalty coefficient used to penalize the agent for selling big quantities of stocks.
- Type
float
-
optimizer
¶ -
The optimizer used to update the weights of the neural network.
- Type
torch.optim
-
loss_fn
¶ -
The loss function used to calculate the loss between the predicted Q-values and the target Q-values.
- Type
torch.nn
-
__init__
(state_dict: Optional[dict] = None, greedy_decay_rate: float = 0.95, target_update_rate: int = 15, initial_greediness: float = 1, mode: str = 'train', lr: float = 0.001, state_size: int = 5, initial_budget: int = 100, horizon: int = 100, gamma: float = 0.99, quadratic_penalty_coefficient: float = 0.01, verbose: bool = False) None [source]¶
Methods
__init__
([state_dict, greedy_decay_rate, ...])eval
()This function sets the mode to "eval" and puts the main network in evaluation mode.
get_action
(state)This function returns a tensor that is either a random binomial distribution or the index of the maximum value in the output of a neural network, depending on certain conditions.
learn
(experience_batch)This function trains a neural network using a batch of experiences and updates the target network periodically.
train
()This function sets the mode to "train" and trains the main neural network.
-
__complete_target
(experience_batch: numpy.ndarray) Tuple[torch.Tensor, torch.Tensor, torch.Tensor] ¶ -
This function takes in a batch of experiences and returns the corresponding targets, actions, and states for training a reinforcement learning agent.
- Parameters
experience_batch (np.ndarray) – experience_batch is a numpy array containing a batch of experiences. Each experience is a
The (dictionary containing information about a single step taken by the agent in the environment.) –
"state" (dictionary contains keys such as) –
"action" –
"reward" –
"next_state" –
"dist2Horizon". (and) –
- Returns
a tuple of three torch Tensors
- Return type
targets, actions, and states.
-
__update_target_net
() None ¶ -
This function updates the target network by loading the state dictionary of the main network.
-
_abc_impl
= <_abc_data object>¶
-
eval
() None [source]¶ -
This function sets the mode to “eval” and puts the main network in evaluation mode.
-
get_action
(state: ddql_optimal_execution.state._state.State) int [source]¶ -
This function returns a tensor that is either a random binomial distribution or the index of the maximum value in the output of a neural network, depending on certain conditions.
- Parameters
state (State) – The state parameter is an instance of the State class, which contains information about the
typically (current state of the environment in which the agent is operating. This information) –
position (includes things like the agent's current) –
board (the state of the game) –
other (and any) –
needs (relevant information that the agent) –
- Return type
an integer that represents the action to be taken based on the given state. If the greediness
parameter is set and the mode is “train”, a random binomial distribution is generated using the state’s inventory as the number of trials and the probability of success as 1/inventory. Otherwise, the action is determined by the main neural network’s output, which is the index of the maximum value in the output Q-values tensor.
-
learn
(experience_batch: numpy.ndarray) None [source]¶ -
This function trains a neural network using a batch of experiences and updates the target network periodically.
- Parameters
experience_batch (np.ndarray) – The experience_batch parameter is a numpy array containing a batch of experiences, where each
(state (experience is a tuple of) –
action –
reward –
next_state –
the (dist2Horizon). This batch is used to update) –
backpropagation. (neural network's weights through) –