The simple strategy defined is the following:
- The symbol used is “IBM” (you can use any other symbol)
- A SVM (Support Vector Machine) predicts the close value of the next day, and when the value is mayor than the previous day, we obtain a buy signal and otherwise a shell signal.
- The training data used are historical prizes (close, high, volumen) from 2006 to 2009
- The validation is done with historical information from 2010
- It is calculated the following indicators RSI, EMA 7, EMA 50, EMA 200, MACD y ADX.
- It is created a two days delay temporal window for all historical values.
The results are: Initial ROC of the past tutorial
The trading % win in the past strategy:
Evolving feature selection in 40 generation, the final ROC performance is improved.
The ROC funtion improved is the following:
The % win trades is also improved
It is possible to select other kind of optimization algorithm and to maximize or minimize other value like drawdown or other type of ratios like Kelly or sharpen ratio. In the next tutorial, I will improve the trading operation in order to make as real as possible and to incorporate as XML configuration files the symbols.
DOWNLOAD FILES 2$ < TO IMPROVE THE BLOG
Hello there,
ReplyDeleteThe e-mail you provided me was not working.
Please send me an e-mail so I'd know that you got the massage.
Thank you for joining R-bloggers.com, I've just added your feed to the site, welcome!
If I may ask, please consider (but don't feel obligated in any way) to mention R-bloggers in one of your future posts (or to even link back from your blogroll), any link/mention will be of great help :)
All the best,
Tal
Thanks Tal I will contact you
ReplyDeleteI am always wondering if such kind of strategy really works in practice, first, I am sure different maximization methods lead to possibly totally different results, which makes not much sense for a typical stable trading; second, do we really understand what's going on inside the black-box, genetic algorithm gives us a result, good, but how to interpret it then? I may feel unsecured by using it. Any tip / story on how it is applied in practice is appreciated.
ReplyDeleteHi Quant!!
ReplyDeleteI agree but not at 100% , my PHD was related to extract acknowledge from blackboxes like neuralnetworks and neuro-fuzzy algorithms, always there is a trade off between accuracy and interpretability or understandability of the model. If you want a high accuracy of the model then the model is like a blackbox (there are no lineal relations found by the system with a difficult explanation), if you want understandability then you lost accuracy (simple model then simple accuracy). In my work you have to look for equilibrium between both concepts. You always need to find if the solution have, in this case, finance or trading sense. The genetic algorithms are a simple search method; it looks for the input that maximizes one function (in this case the ROC). You should only verify that the input obtained from the GA has sense, you need rules that GA is working properly, I will do this work in the following days.
Great posts so far! I have a question. Why did you not use the Sliding Window Validation operator in this example as you had done in your previous example? Both tutorials deal with the same time series data, so I'm guessing that the Sliding Window Validation operator would also be valid here.
ReplyDeleteThank you!
Tony
Hi Toni
ReplyDeleteThe most correct way for validation will be to use the sliding Windows like in the previous post. However the time required in a genetic search will too much, that is a compromise between the most correct validation method and the time required to do that.
Best Regards
Rafael
Ok, I understand and agree. Thank you!
ReplyDeleteDear Physicist,
ReplyDeletelooks to me like in your process you're using 2006 to 2009 to optimize your system and 2010 to validate it.
If this is the case, your results can't but be good, but not reproducible. You should use something like 2006 to 2008 to optimize, 2009 to validate, and 2010 to test your results.
The way you do, you just show the best possible result, but although it is an interesting exercise, it has nothing to do with real life. Pls correct me if I'm wrong.
Hi Palma
ReplyDeleteThanks for your comment. You're right the best solution for test is one training + Validation + Testing. Temporal serial are a headache, the best solution should have one day sliding window for training + one day validation + one day test. And then to move the window trough all temporal interval.
If we use 2009 for validation and 2010 for test is too much time, in my opinion we need a sliding window if there is a new pattern in 2009 you couldn't learn it.
It is an example however you can modify the example to do that, if you need help to modify the example don't doubt to contact me.
you use full data set to optimize selection, and then after you have optimized features you divide the set into in- and out-of-sample data. Could you try to train the model(including optimize selection) on the in-sample data? Seems it is not possible in RM, as optimize selection does not have model as the output...
ReplyDeleteFor optimization I use 1000 days for training and the rest for evaluation (that is configured inside the validation block, inside optimization process). However from the optimization process, I only obtain the data with the the best indicators. so I need to train again the model with the 1000 days, obtain the model from RM (output from SVM algorithm) and again validate the model
ReplyDeletethanks for sharing this, its amazing. I am new with rapidminer but I love to see the amazing stuff others are doing.
ReplyDeleteI tried the link to download the src but it does not seem to work, can you point me to where i can access the source, thanks
ReplyDelete