ddql_optimal_execution.agent.DDQL

class ddql_optimal_execution.agent.DDQL(state_dict: Optional[dict] = None, greedy_decay_rate: float = 0.95, target_update_rate: int = 15, initial_greediness: float = 1, mode: str = 'train', lr: float = 0.001, state_size: int = 5, initial_budget: int = 100, horizon: int = 100, gamma: float = 0.99, quadratic_penalty_coefficient: float = 0.01, verbose: bool = False)[source]

The DDQL class inherits from the Agent class. It is an agent that implements a Double Deep Q-Learning algorithm.

Parameters
  • state_dict (dict, optional) – A dictionary containing the state of the agent, by default None

  • greedy_decay_rate (float, optional) – The greedy decay rate, by default 0.95

  • target_update_rate (int, optional) – The target update rate, by default 15

  • initial_greediness (float, optional) – The initial greediness, by default 1

  • mode (str, optional) – The mode, by default “train”

  • lr (float, optional) – The learning rate, by default 1e-3

  • state_size (int, optional) – The state size, by default 5

  • initial_budget (int, optional) – The initial budget, by default 100

  • horizon (int, optional) – The horizon, by default 100

  • gamma (float, optional) – The gamma parameter used in the Q-Learning algorithm, by default 0.99

  • quadratic_penalty_coefficient (float, optional) – The quadratic penalty coefficient used to penalize the agent for selling big quantities of stocks, by default 0.01

device

The device used to run the agent

Type

torch.device

main_net

The main neural network used to predict the Q-values of the state-action pairs

Type

QNet

target_net

The target neural network used to predict the Q-values of the state-action pairs

Type

QNet

state_size

The state size

Type

int

greedy_decay_rate

The greedy decay rate

Type

float

target_update_rate

The target update rate

Type

int

initial_greediness

The initial greediness of the agent. It is used to determine the probability of the agent taking a random action.

Type

float

greediness

The current greediness of the agent.

Type

float

mode

The mode of the agent. It can be either “train” or “test”.

Type

str

lr

The learning rate used to update the weights of the neural network.

Type

float

gamma

The gamma parameter used in the Q-Learning algorithm.

Type

float

quadratic_penalty_coefficient

The quadratic penalty coefficient used to penalize the agent for selling big quantities of stocks.

Type

float

optimizer

The optimizer used to update the weights of the neural network.

Type

torch.optim

loss_fn

The loss function used to calculate the loss between the predicted Q-values and the target Q-values.

Type

torch.nn

__init__(state_dict: Optional[dict] = None, greedy_decay_rate: float = 0.95, target_update_rate: int = 15, initial_greediness: float = 1, mode: str = 'train', lr: float = 0.001, state_size: int = 5, initial_budget: int = 100, horizon: int = 100, gamma: float = 0.99, quadratic_penalty_coefficient: float = 0.01, verbose: bool = False) None[source]

Methods

__init__([state_dict, greedy_decay_rate, ...])

eval()

This function sets the mode to "eval" and puts the main network in evaluation mode.

get_action(state)

This function returns a tensor that is either a random binomial distribution or the index of the maximum value in the output of a neural network, depending on certain conditions.

learn(experience_batch)

This function trains a neural network using a batch of experiences and updates the target network periodically.

train()

This function sets the mode to "train" and trains the main neural network.

__complete_target(experience_batch: numpy.ndarray) Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

This function takes in a batch of experiences and returns the corresponding targets, actions, and states for training a reinforcement learning agent.

Parameters
  • experience_batch (np.ndarray) – experience_batch is a numpy array containing a batch of experiences. Each experience is a

  • The (dictionary containing information about a single step taken by the agent in the environment.) –

  • "state" (dictionary contains keys such as) –

  • "action"

  • "reward"

  • "next_state"

  • "dist2Horizon". (and) –

Returns

a tuple of three torch Tensors

Return type

targets, actions, and states.

__update_target_net() None

This function updates the target network by loading the state dictionary of the main network.

_abc_impl = <_abc_data object>
eval() None[source]

This function sets the mode to “eval” and puts the main network in evaluation mode.

get_action(state: ddql_optimal_execution.state._state.State) int[source]

This function returns a tensor that is either a random binomial distribution or the index of the maximum value in the output of a neural network, depending on certain conditions.

Parameters
  • state (State) – The state parameter is an instance of the State class, which contains information about the

  • typically (current state of the environment in which the agent is operating. This information) –

  • position (includes things like the agent's current) –

  • board (the state of the game) –

  • other (and any) –

  • needs (relevant information that the agent) –

Return type

an integer that represents the action to be taken based on the given state. If the greediness

parameter is set and the mode is “train”, a random binomial distribution is generated using the state’s inventory as the number of trials and the probability of success as 1/inventory. Otherwise, the action is determined by the main neural network’s output, which is the index of the maximum value in the output Q-values tensor.

learn(experience_batch: numpy.ndarray) None[source]

This function trains a neural network using a batch of experiences and updates the target network periodically.

Parameters
  • experience_batch (np.ndarray) – The experience_batch parameter is a numpy array containing a batch of experiences, where each

  • (state (experience is a tuple of) –

  • action

  • reward

  • next_state

  • the (dist2Horizon). This batch is used to update) –

  • backpropagation. (neural network's weights through) –

train() None[source]

This function sets the mode to “train” and trains the main neural network.