ddql_optimal_execution.experience_replay.ExperienceReplay¶
-
class
ddql_optimal_execution.experience_replay.
ExperienceReplay
(capacity: int = 10000)[source]¶ -
The ExperienceReplay class is a memory buffer that stores and retrieves experiences for reinforcement learning agents.
-
capacity
¶ -
The capacity attribute is an integer that represents the maximum number of experiences that can be stored in the memory buffer.
- Type
int
-
memory
¶ -
The memory attribute is a numpy array that stores the experiences in the memory buffer.
- Type
np.ndarray
-
position
¶ -
The position attribute is an integer that represents the current position in the memory buffer.
- Type
int
-
__make_room
()¶ -
This function randomly deletes a row from a memory list in the first half of the list and shifts the remaining rows up by one position.
-
push
(state: State, action: int, reward: float, next_state: State, dist2Horizon: int)[source]¶ -
This is a method to add an experience tuple to a memory buffer with a fixed capacity.
-
sample
(batch_size: int)¶ -
This function samples a batch of experiences from the memory buffer.
Methods
__init__
([capacity])get_sample
([batch_size])This function returns a random sample of a specified batch size from a memory.
push
(state, action, reward, next_state, ...)This is a method to add an experience tuple to a memory buffer with a fixed capacity.
Attributes
-
__make_room
()¶ -
This function randomly deletes a row from a memory list in the first half of the list and shifts the remaining rows up by one position.
-
get_sample
(batch_size: int = 128)[source]¶ -
This function returns a random sample of a specified batch size from a memory.
- Parameters
batch_size (int, optional) – The batch size is the number of samples that will be randomly selected from the memory buffer to be
case (used for training or inference. In this) –
128 (which means that) –
128 –
buffer. (samples will be randomly selected from the memory) –
- Return type
The function get_sample is returning a batch of randomly selected samples from the memory. The
size of the batch is determined by the batch_size parameter. The function returns an array of samples from the memory, where each sample is represented as a tuple of (state, action, reward, next_state, done) values.
-
property
is_empty
¶
-
property
is_full
¶
-
push
(state: ddql_optimal_execution.state._state.State, action: int, reward: float, next_state: ddql_optimal_execution.state._state.State, dist2Horizon: int) None [source]¶ -
This is a method to add an experience tuple to a memory buffer with a fixed capacity.
- Parameters
state (State) – The current state of the agent, which is usually represented as a vector or an array of values that
environment. (describe the) –
action (int) – The action taken by the agent in the given state.
reward (float) – The reward parameter is a float value that represents the reward received by the agent for taking
the (the action in the given state. It is used to update the Q-values of the state-action pairs in) –
algorithm. (reinforcement learning) –
next_state (State) – The next state is the state that the agent transitions to after taking an action in the current
class. (state. It is represented as an object of the State) –
dist2Horizon (int) – dist2Horizon refers to the distance to the horizon, which is the maximum number of steps the agent
episode (can take before the episode ends. It is used to keep track of how many steps are left in the) –
buffer. (when storing experiences in the replay) –
-