a Physicist in Wall Street: Genetic optimization for Trading Strategies using Rapidminer and R

December 5, 2010

Genetic optimization for Trading Strategies using Rapidminer and R

That is the second tutorial of Rapidminer and R extension for Trading and the first in Video. In the last example the ROC obtained is not as good as it should be to make money in this business, To improve the strategy we will try to optimize the trading strategy. Different methods of optimization and objective functions for trading can be studied in the literature, Finally we will use a genetic non-multiobjetive to optimize our simple strategy.

The simple strategy defined is the following:

The symbol used is “IBM” (you can use any other symbol)
A SVM (Support Vector Machine) predicts the close value of the next day, and when the value is mayor than the previous day, we obtain a buy signal and otherwise a shell signal.
The training data used are historical prizes (close, high, volumen) from 2006 to 2009
The validation is done with historical information from 2010
It is calculated the following indicators RSI, EMA 7, EMA 50, EMA 200, MACD y ADX.
It is created a two days delay temporal window for all historical values.

For the optimization of the strategy it is used a genetic algorithm. The genetic algorithm will modify the input data by removing any entries (for example indicators) in order to maximize the ROC of the strategy . You can watch in the video the model generated:

The results are: Initial ROC of the past tutorial

The trading % win in the past strategy:

Evolving feature selection in 40 generation, the final ROC performance is improved.

The ROC funtion improved is the following:

The % win trades is also improved

It is possible to select other kind of optimization algorithm and to maximize or minimize other value like drawdown or other type of ratios like Kelly or sharpen ratio. In the next tutorial, I will improve the trading operation in order to make as real as possible and to incorporate as XML configuration files the symbols.

DOWNLOAD FILES 2$ <

TO IMPROVE THE BLOG

13 comments:

Tal GaliliDecember 5, 2010 at 5:21 AM
Hello there,
The e-mail you provided me was not working.
Please send me an e-mail so I'd know that you got the massage.

Thank you for joining R-bloggers.com, I've just added your feed to the site, welcome!

If I may ask, please consider (but don't feel obligated in any way) to mention R-bloggers in one of your future posts (or to even link back from your blogroll), any link/mention will be of great help :)

All the best,
Tal
ReplyDelete
Replies
a PhysicistDecember 5, 2010 at 5:24 AM
Thanks Tal I will contact you
ReplyDelete
Replies
QuantDecember 5, 2010 at 7:26 AM
I am always wondering if such kind of strategy really works in practice, first, I am sure different maximization methods lead to possibly totally different results, which makes not much sense for a typical stable trading; second, do we really understand what's going on inside the black-box, genetic algorithm gives us a result, good, but how to interpret it then? I may feel unsecured by using it. Any tip / story on how it is applied in practice is appreciated.
ReplyDelete
Replies
a PhysicistDecember 6, 2010 at 1:49 AM
Hi Quant!!
I agree but not at 100% , my PHD was related to extract acknowledge from blackboxes like neuralnetworks and neuro-fuzzy algorithms, always there is a trade off between accuracy and interpretability or understandability of the model. If you want a high accuracy of the model then the model is like a blackbox (there are no lineal relations found by the system with a difficult explanation), if you want understandability then you lost accuracy (simple model then simple accuracy). In my work you have to look for equilibrium between both concepts. You always need to find if the solution have, in this case, finance or trading sense. The genetic algorithms are a simple search method; it looks for the input that maximizes one function (in this case the ROC). You should only verify that the input obtained from the GA has sense, you need rules that GA is working properly, I will do this work in the following days.
ReplyDelete
Replies
UnknownDecember 8, 2010 at 12:36 PM
Great posts so far! I have a question. Why did you not use the Sliding Window Validation operator in this example as you had done in your previous example? Both tutorials deal with the same time series data, so I'm guessing that the Sliding Window Validation operator would also be valid here.

Thank you!
Tony
ReplyDelete
Replies
a PhysicistDecember 8, 2010 at 1:54 PM
Hi Toni

The most correct way for validation will be to use the sliding Windows like in the previous post. However the time required in a genetic search will too much, that is a compromise between the most correct validation method and the time required to do that.

Best Regards
Rafael
ReplyDelete
Replies
UnknownDecember 9, 2010 at 1:09 PM
Ok, I understand and agree. Thank you!
ReplyDelete
Replies
UnknownDecember 14, 2010 at 12:27 PM
Dear Physicist,
looks to me like in your process you're using 2006 to 2009 to optimize your system and 2010 to validate it.
If this is the case, your results can't but be good, but not reproducible. You should use something like 2006 to 2008 to optimize, 2009 to validate, and 2010 to test your results.
The way you do, you just show the best possible result, but although it is an interesting exercise, it has nothing to do with real life. Pls correct me if I'm wrong.
ReplyDelete
Replies
a PhysicistDecember 15, 2010 at 12:08 PM
Hi Palma

Thanks for your comment. You're right the best solution for test is one training + Validation + Testing. Temporal serial are a headache, the best solution should have one day sliding window for training + one day validation + one day test. And then to move the window trough all temporal interval.
If we use 2009 for validation and 2010 for test is too much time, in my opinion we need a sliding window if there is a new pattern in 2009 you couldn't learn it.

It is an example however you can modify the example to do that, if you need help to modify the example don't doubt to contact me.
ReplyDelete
Replies
AnonymousJanuary 9, 2011 at 4:40 AM
you use full data set to optimize selection, and then after you have optimized features you divide the set into in- and out-of-sample data. Could you try to train the model(including optimize selection) on the in-sample data? Seems it is not possible in RM, as optimize selection does not have model as the output...
ReplyDelete
Replies
a PhysicistJanuary 10, 2011 at 12:28 PM
For optimization I use 1000 days for training and the rest for evaluation (that is configured inside the validation block, inside optimization process). However from the optimization process, I only obtain the data with the the best indicators. so I need to train again the model with the 1000 days, obtain the model from RM (output from SVM algorithm) and again validate the model
ReplyDelete
Replies
YakitoOctober 16, 2011 at 10:49 PM
thanks for sharing this, its amazing. I am new with rapidminer but I love to see the amazing stuff others are doing.
ReplyDelete
Replies
bloggerAugust 12, 2013 at 12:13 AM
I tried the link to download the src but it does not seem to work, can you point me to where i can access the source, thanks
ReplyDelete
Replies

Add comment