S
StabMe
I am still fairly new to the field so forgive me if the whole post and my questions sound stupid.
A bit of explanation first.
So i have a trading strategy which is an extension of an Avellaneda-Stoikov model which i thoroughly tested in a backtest simulating hft environment and it is trading live as well with decent results on crypto exchanges. Sometimes it looses money due to sudden volatility spikes which i am able to manage by detecting such spikes and stopping sending orders during those. But sometimes i see, that when the market is less volatile, for example, should the strategy place bid/ask orders more agressively it would have made more money as well as i notice that when there are strong trends in price movement, it tends to be not as profitable. I notice, that by being able to switch to more or less aggressive modes of trading based on market regime, the strategy could have earned more or could have not lost as much being more (or less) risk averse.
So i had an idea - run a series of backtests (which align quite nicely with live trading, btw, in terms of how PNL of live vs backtest's match) with different trading strategy parameters, collect features that describe market state, collect value of equity earned during a step and save the data, along with parameters, into a data set. I chose a period of 5 seconds, during which about 10 to 100 orders are placed, during which features are collected and equity earned is calculated. I decided to try 3 combinations of two parameters, namely gamma and delta of Avellaneda-Stoikov model: a combination that has shown good performance initially, and combinations that would be more/less risk averse. So a series of such backtests results in a series of datasets, which are then concatenated into a single dataset.
So we have a large dataset with results of series of backtests with columns describing a state of the market, a step reward (equity earned) and gamma/delta values. So this dataset is then used to train a RandomForestRegression model. Target value is step_reward, which is shifted by one position, so that we can predict which features/parameters combination result in which step_reward.
Before training, features are discretized into deciles and an array of all possible feature combinations is created. All feature combinations are then tried with all possible parameter combinations and combination which predicts higher step_reward value is saved into a lookup tavle. this lookup table is then used in a backtester (or could be used in a live trading environment) so that we can answer to a question - which gamma and delta values are going to be most profitable given this set of discretized features? A lookup table is effectively a typed dict where a set of features is expressed as a base10 value and used as a key and gamma/delta is expressed as a list and stored in the value of this dict. This dict is then passed to a backtester.
Along with lookup table, i also create a bucket_ranges table which aligns real values of features into discretized buckets. Discretizing was performed on a whole dataset, so discretizing using on the fly would result in different values. This is why, as a quick workaround, i decided to use a bucket_ranges table which simply converts real feature value to a bucket number which is then used as part of the key when optimal gamma/delta are requested.
Instead of predicting highest step_reward i also tried to predict step_reward sign.
This idea is loosely based on this article, where authours tried to train a reinforcement learning model to predict the sign of the equity earned during a step.
When i train a model to predict a sign of the equity, i can obtain accuracy score of upto 80% with RandomForestClassifier, but this requires too many buckets and too many features to be used, which makes creating a lookup table impractical - it's length becomes of a few hundred million records. When there are only 5-6 different features discretized in buckets, accuracy score drops to about 60% for a period i tested it.
So when i use a generated lookup table in a backtester, using predicted values of gamma/delta result in a significant increase in sharpe ratio and a drop in maximum drowdown, however, such improvement is only relevant for period during which the model was trained on, which is not a surprise. When the period is shifted, the performance drops even lower than if i use a single combination of empirically found parameters previously.
Features i used (all of them or different sets of those mentioned): rsi, volume, vwap, mean position, number of fills during a step, volatility (long and short windows), mean book imbalance during a step (at bbo level, at 5 levels deep, at 2.5% deep from the midprice).
So my question is - is this a common practice to enrich a trading strategy with a regression model where what we predict is not next price move, but rather a combination of parameters that have a high probability of resulting in higher proft gain? Or is this is an outworldly idea that just doesnt stand up to critisism? I do use Bayesian optimization methods when trying to find optimal parameters while backtesting, but these parameters are set once and never changed, so thought that i could 'teach' the model to be more flexible during market regime changes.
TLDR: Is using a regression model for predicting optimal trading strategy parameters based on collected data of features describing a market state and profits earned during that time a bad idea?
A bit of explanation first.
So i have a trading strategy which is an extension of an Avellaneda-Stoikov model which i thoroughly tested in a backtest simulating hft environment and it is trading live as well with decent results on crypto exchanges. Sometimes it looses money due to sudden volatility spikes which i am able to manage by detecting such spikes and stopping sending orders during those. But sometimes i see, that when the market is less volatile, for example, should the strategy place bid/ask orders more agressively it would have made more money as well as i notice that when there are strong trends in price movement, it tends to be not as profitable. I notice, that by being able to switch to more or less aggressive modes of trading based on market regime, the strategy could have earned more or could have not lost as much being more (or less) risk averse.
So i had an idea - run a series of backtests (which align quite nicely with live trading, btw, in terms of how PNL of live vs backtest's match) with different trading strategy parameters, collect features that describe market state, collect value of equity earned during a step and save the data, along with parameters, into a data set. I chose a period of 5 seconds, during which about 10 to 100 orders are placed, during which features are collected and equity earned is calculated. I decided to try 3 combinations of two parameters, namely gamma and delta of Avellaneda-Stoikov model: a combination that has shown good performance initially, and combinations that would be more/less risk averse. So a series of such backtests results in a series of datasets, which are then concatenated into a single dataset.
So we have a large dataset with results of series of backtests with columns describing a state of the market, a step reward (equity earned) and gamma/delta values. So this dataset is then used to train a RandomForestRegression model. Target value is step_reward, which is shifted by one position, so that we can predict which features/parameters combination result in which step_reward.
Before training, features are discretized into deciles and an array of all possible feature combinations is created. All feature combinations are then tried with all possible parameter combinations and combination which predicts higher step_reward value is saved into a lookup tavle. this lookup table is then used in a backtester (or could be used in a live trading environment) so that we can answer to a question - which gamma and delta values are going to be most profitable given this set of discretized features? A lookup table is effectively a typed dict where a set of features is expressed as a base10 value and used as a key and gamma/delta is expressed as a list and stored in the value of this dict. This dict is then passed to a backtester.
Along with lookup table, i also create a bucket_ranges table which aligns real values of features into discretized buckets. Discretizing was performed on a whole dataset, so discretizing using on the fly would result in different values. This is why, as a quick workaround, i decided to use a bucket_ranges table which simply converts real feature value to a bucket number which is then used as part of the key when optimal gamma/delta are requested.
Instead of predicting highest step_reward i also tried to predict step_reward sign.
This idea is loosely based on this article, where authours tried to train a reinforcement learning model to predict the sign of the equity earned during a step.
When i train a model to predict a sign of the equity, i can obtain accuracy score of upto 80% with RandomForestClassifier, but this requires too many buckets and too many features to be used, which makes creating a lookup table impractical - it's length becomes of a few hundred million records. When there are only 5-6 different features discretized in buckets, accuracy score drops to about 60% for a period i tested it.
So when i use a generated lookup table in a backtester, using predicted values of gamma/delta result in a significant increase in sharpe ratio and a drop in maximum drowdown, however, such improvement is only relevant for period during which the model was trained on, which is not a surprise. When the period is shifted, the performance drops even lower than if i use a single combination of empirically found parameters previously.
Features i used (all of them or different sets of those mentioned): rsi, volume, vwap, mean position, number of fills during a step, volatility (long and short windows), mean book imbalance during a step (at bbo level, at 5 levels deep, at 2.5% deep from the midprice).
So my question is - is this a common practice to enrich a trading strategy with a regression model where what we predict is not next price move, but rather a combination of parameters that have a high probability of resulting in higher proft gain? Or is this is an outworldly idea that just doesnt stand up to critisism? I do use Bayesian optimization methods when trying to find optimal parameters while backtesting, but these parameters are set once and never changed, so thought that i could 'teach' the model to be more flexible during market regime changes.
TLDR: Is using a regression model for predicting optimal trading strategy parameters based on collected data of features describing a market state and profits earned during that time a bad idea?