ddql_optimal_execution.environnement.MarketEnvironnement¶

class ddql_optimal_execution.environnement.MarketEnvironnement(initial_inventory: int = 100, data_path: str = '../data', n_periods: int = 5, quadratic_penalty_coefficient: float = 0.01, multi_episodes: bool = False, **kwargs)[source]¶

This class represents the environment in which the agent is operating. It contains information such as the current time step, the agent’s current position, and any other relevant information about the environment.

Parameters

initial_inventory (int, optional) – The initial_inventory parameter is a float that represents the initial inventory of the agent. The default value is 100.0.
data_path (str, optional) – The data_path parameter is a string that represents the path to the directory containing the data files. The default value is “../data”.
n_periods (int, optional) – The n_periods parameter is an integer that represents the number of periods in the data files. The default value is 5.
quadratic_penalty_coefficient (float, optional) – The quadratic_penalty_coefficient parameter is a float that represents the coefficient of the quadratic penalty term in the reward function. The default value is 0.01.
multi_episodes (bool, optional) – The multi_episodes parameter is a boolean that indicates whether the agent is operating in multi-episode mode. The default value is False.

initial_inventory¶

The initial_inventory attribute is a float that represents the initial inventory of the agent.

Type: int

n_periods¶

The n_periods attribute is an integer that represents the number of periods in the data files.

Type: int

current_episode¶

The current_episode attribute is an integer that represents the current episode number.

Type: int

multi_episodes¶

The multi_episodes attribute is a boolean that indicates whether the agent is operating in multi-episode mode.

Type: bool

quadratic_penalty_coefficient¶

The quadratic_penalty_coefficient attribute is a float that represents the coefficient of the quadratic penalty term in the reward function.

Type: float

horizon¶

The horizon attribute is an integer that represents the number of periods in the data files.

Type: int

preprocessor¶

The preprocessor attribute is an instance of the Preprocessor class, which is used to preprocess the data files.

Type: Preprocessor

historical_data_series¶

The historical_data_series attribute is a list of strings that represents the paths to the data files.

Type: List[str]

historical_data¶

The historical_data attribute is a pandas DataFrame that contains the historical data.

Type: pd.DataFrame

state¶

The state attribute is an instance of the State class, which represents the current state of the environment in which the agent is operating. It contains information such as the current time step, the agent’s current position, and any other relevant information about the environment.

Type: State

done¶

The done attribute is a boolean that indicates whether the episode is over.

Type: bool

__init__(initial_inventory: int = 100, data_path: str = '../data', n_periods: int = 5, quadratic_penalty_coefficient: float = 0.01, multi_episodes: bool = False, **kwargs) → None[source]¶

Methods

`__init__`([initial_inventory, data_path, ...])
`get_current_raw_price`()	This function returns the current raw price.
`get_state`([copy])	This function returns a copy of the current state of an object if the copy parameter is True, otherwise it returns the current state itself.
`reset`()	The "reset" function initializes the state and sets the "done" flag to False.
`step`(action)	This function executes one time step within the environment, raises an error if the action is invalid, executes the action, updates the state, and returns the reward.
`swap_episode`(episode)	This function swaps the current episode in a time series environment and updates the historical data, periods, and state accordingly.

__compute_reward(action: int) → float¶

This function computes the reward for a given action based on the current inventory and historical price data.

Parameters

action (int) – The input parameter action is an integer representing the number of shares to sell at each
step. (time) –

Return type

The function __compute_reward returns a float value which represents the reward calculated based

on the given action and the current state of the environment.

__initialize_state() → None¶: This function initializes the state of an object with historical data and an initial inventory.

__load_episode() → None¶

This function loads an episode by preprocessing historical data, initializing the state, and setting the “done” flag to False.

Parameters

df (pd.DataFrame) – The parameter df is a pandas DataFrame that is being passed as an argument to the
However (__load_episode method.) –
unnecessary. (it is not being used in the method and seems to be) –

__update_state(action: int) → None¶

This function updates the state of an inventory management environment based on a given action and historical data.

Parameters

action (int) – The parameter action is an integer representing the amount of inventory to be subtracted from the
dictionary. (current inventory level in the self.state) –

get_current_raw_price() → pandas.Series[source]¶: This function returns the current raw price.

get_state(copy=True) → ddql_optimal_execution.state._state.State[source]¶

This function returns a copy of the current state of an object if the copy parameter is True, otherwise it returns the current state itself.

Parameters

copy – A boolean parameter that determines whether a copy of the state should be returned or the original
optional – A boolean parameter that determines whether a copy of the state should be returned or the original
True (state object. If copy is) –
returned (a copy of the state object is) –
state (otherwise the original) –
returned. (object is) –

Return type

The method get_state is returning a copy of the current state of the object if copy is set to

True, otherwise it returns the current state object itself. The return type is State.

reset() → None[source]¶: The “reset” function initializes the state and sets the “done” flag to False.

step(action: int) → float[source]¶

This function executes one time step within the environment, raises an error if the action is invalid, executes the action, updates the state, and returns the reward.

Parameters: action (int) – an integer representing the action taken by the agent in the environment. In this

case, it is assumed that the agent is making a decision about how much of a certain item to sell, and the action parameter represents the quantity of that item to sell.

Returns: a float value, which is the reward obtained after executing the action in the environment.

swap_episode(episode: int) → None[source]¶

This function swaps the current episode in a time series environment and updates the historical data, periods, and state accordingly.

Parameters: episode (int) – The episode parameter is an integer that represents the episode number to be

swapped to.