2:48 PM | By
a Physicist |
Edit Post
Always new software language in one technical activity is difficult, normally a good documentation can help, these are three book to use R software for beginner and for experts:
· “Introduction to the R Project for Statistical Computing for Use at the ITC” by David Rossiter (PDF, 2010-11-21).
· “R for Beginners” by Emmanuel Paradis (PDF,10 pages).
· A Little Book of R for Multivariate Analysis (pdf, 49 pages) is a simple introduction to multivariate analysis using the R statistics software. It covers topics such as reading and plotting multivariate data, principal components analysis, and linear discriminant analysis.
· A Little Book of R for Biomedical Statistics (pdf, 33 pages) is a simple introduction to biomedical statistics using the R statistics software, with sections on relative risks and odds ratios, dose-response analysis, clinical trial design and meta-analysis.
· A Little Book of R for Time Series (pdf, 71 pages) is a simple introduction to time series analysis using the R statistics software (have you spotted the pattern yet?). It includes instruction on how to read and plot time series, time series decomposition, forecasting, and ARIMA models.
All books are free to use, share and remix under a Creative Commons license, and are available:
UPDATE: I updated the title, that's only five free books that I think interesting for R, Also there is another one that I forgot Matlab for R programmer I used Matlab from university till now (for me is always easier Matlab, but it is not free), both languages are similar but you always need a help (small tips).via: Revolutions
| Opinions: |
12:36 PM | By
a Physicist |
Edit Post
Interesting Video where Kevin Slavin argues that we're living in a world designed for -- and increasingly controlled by -- algorithms. I agree, now algorithms executed in computers are changing the world, however that is the tip of the iceberg.
| |
| Opinions: |
11:18 AM | By
a Physicist |
Edit Post
Long time without updating the blog for lack of time (again) due to new professional and personal challenges. Continuing with the strategy of Black Box, thanks to recommendations made by several readers and the lack of time to make a good tutorial of the model, I’m going to make available the file with a new version expecting new interesting and rewarding comments in order to improve the model. Actually the main problems in the strategy (commented by the users) are:
- - The overlapping between test, and evaluation, (solution, avoid this overlapping reduce test time)
- - The function maximization is based on profits (other possibilities are possible, working on it)
- - Risk management (related)
- - You cannot select in an easy way the stock name (working on it)
- - No portfolio management
- - Low success in the prediction.
The archive can be downloaded free in this link. Please send me your comments, your modifications and suggestion … Publicar entrada
| Opinions: |
11:01 AM | By
a Physicist |
Edit Post
This my first post in 2011. this post has cost me a bit more than usual, but I hope it meets expectations. The aim of this tutorial is to generate an algorithm based on black box trading, with all the necessary elements for evaluation. That is a first post of several, in order to explore the problems, features of this strategy.
In recent posts we have explored the performance assessment of a simple strategy. To improve this kind of evaluation and to enhance the capabilities of the use of R and RapidMiner for trading we have included the blotter library possibilities into the model evaluation (that not was simple due to the limitations of R extension). Blotter is an R package that tracks the P&L of your trading systems (or simulations), even if your portfolio spans many security types and/or currencies. This post uses blotter to track the blackbox trading strategy and it will allows us to manage different profiles with Rapidminer.
The basic strategy was done for GSPC and it is described in the following diagram

The trading strategy is divided into the generation of models, prediction and filtering spurious signals. The first step is the model generation, I tested different regression models, neural networks, SVM, etc for the prediction model generation. For spurious signals littering we have used a binary classification (bad prediction or correct prediction). The binary classification (correct or false prediction) is done using different rule extraction algorithms using as input only technical indicators.

The second step is the Evaluation of the strategy using the models generated in the previous step

To implement this strategy we have used Rapidminer and R plugin, you can see the complexity of the algorithm in the following picture. The main requirement is Blotter, however it is not yet in the cran repository, so you need to install from R-forge
To use the last version of blotter you need to update R to the last one in that moment 2.12 version.

I will describe the most important R models, and in a future post I will describe, inside one video, all elements of the architecture.
The main elements are:
- Obtain Technical Data for Training model
- Evaluation Strategy
Obtain Technical Data for Training module
Evaluation Strategy
This module define, using blotter, the strategy described before.
It was defined three time interval, one for training (800 days), one for evaluation genetic space search (300 days) and finally the rest of the days for testing ( >200 days)
Several tests have been done using Neural Networks and SVM, obtaining diverse results, however always positive results. You can see in the following image the results obtained
Using neural network we have improved the performance of the strategy in the training and evaluation data, during 7 hours, 100 generations we obtain the following improvement (figure). We use the same technique than in previous post. It is not possible to use for evaluation Blotter due to the time consumption of this package.

The results obtained using this strategy, without filter can be observed in the following graph, the comparison was done with buy&hold strategy. Although backtesting is no guarantee of future performance, it gives the trader confidence that the strategy has worked in the past. If the strategy is not over-optimized, data-mined, or based on random coincidences, it might have a good chance of working in the future.

The Sharpe Ratio (Rf=0%) of this strategy is 2.46.
On the other hand if we use the filter

Using the filter, we reduce the drawdown peak; however we obtained lower return and a Sharpe Ratio of 2.18. Currently I’m doing several test with different filters however the Sharpe Ratio of the prediction algorithm without filter is very high for this index.
Also, you can modify the type of filter, for example you can see the filter obtained using J48 algorithm. The idea is to auto-generate and understand filters using technical indicators.
J48 pruned tree
------------------
RSI2-0 <= 44.156798: false (409.0/168.0)
RSI2-0 > 44.156798
| DX-0 <= 5.179328
| | DX-0 <= 2.667342
| | | tr-1 <= 4.81: false (10.33/2.33)
| | | tr-1 > 4.81: true (20.67/3.0)
| | DX-0 > 2.667342: true (24.0/1.0)
| DX-0 > 5.179328: true (536.0/255.0)
Number of Leaves : 5
Size of the tree : 9
The current problems of the model are:
- The accuracy of the long and short position prediction, it is not good should be improved.
- My objective is to reduce the risk of the strategy, so the the filter should be improved in order to reduce false signals.
- Reduce the number of variable to adjust for the strategy…
In the following post, I will describe in a video the model with all elements of the architecture, and I will try to discuss about the problems discovered in the strategy.

We do not guarantee that those strategies will give you profits or have mistakes. It is important to remember that each trading situation is unique. You can never copy a strategy and think that it will always work. It is allot factors to take consideration too. For example, how the game looks like, how the odds are moving etc. This blog is a research activity to share knowledge related to datamining and trading algorithms. All code described in this blog is GPL v3, We try to put all information about the model and how to generate the results obtained and also if you have any doubt about the model you will receive the answer by email or comments in the blog. If you don¡t have patient and you want to obtain the files you can, paying one small donation to improve the website,
In recent posts we have explored the performance assessment of a simple strategy. To improve this kind of evaluation and to enhance the capabilities of the use of R and RapidMiner for trading we have included the blotter library possibilities into the model evaluation (that not was simple due to the limitations of R extension). Blotter is an R package that tracks the P&L of your trading systems (or simulations), even if your portfolio spans many security types and/or currencies. This post uses blotter to track the blackbox trading strategy and it will allows us to manage different profiles with Rapidminer.
The basic strategy was done for GSPC and it is described in the following diagram
The trading strategy is divided into the generation of models, prediction and filtering spurious signals. The first step is the model generation, I tested different regression models, neural networks, SVM, etc for the prediction model generation. For spurious signals littering we have used a binary classification (bad prediction or correct prediction). The binary classification (correct or false prediction) is done using different rule extraction algorithms using as input only technical indicators.
The second step is the Evaluation of the strategy using the models generated in the previous step
To implement this strategy we have used Rapidminer and R plugin, you can see the complexity of the algorithm in the following picture. The main requirement is Blotter, however it is not yet in the cran repository, so you need to install from R-forge
install.packages("blotter", repos="http://R-Forge.R-project.org")you can see more information in: https://r-forge.r-project.org/R/?group_id=316To use the last version of blotter you need to update R to the last one in that moment 2.12 version.
I will describe the most important R models, and in a future post I will describe, inside one video, all elements of the architecture.
The main elements are:
- Obtain Technical Data for Training model
- Evaluation Strategy
Obtain Technical Data for Training module
1: 2: library(quantmod) 3: library(TTR) 4: library(PerformanceAnalytics)5: # Parameters
6: SymbolName="^GSPC"
7: initDate="2003-01-01"
8: secondDate="2006-01-01::"
9: # pull stock data from Yahoo Finance
10: symbol<-getSymbols(SymbolName,from=initDate) 11: stock<-xts(get(symbol))12: #remove stock name
13: names(stock)[names(stock)==paste(symbol,'.Open',sep="")] <- 'Open'
14: names(stock)[names(stock)==paste(symbol,'.Close',sep="")] <- 'Close'
15: names(stock)[names(stock)==paste(symbol,'.Volume',sep="")] <- 'Volume'
16: names(stock)[names(stock)==paste(symbol,'.Adjusted',sep="")] <- 'Adjusted'
17: names(stock)[names(stock)==paste(symbol,'.High',sep="")] <- 'High'
18: names(stock)[names(stock)==paste(symbol,'.Low',sep="")] <- 'Low'
19: 20: # Introduce RSI Indicator
21: stock$RSI2 = RSI(Cl(stock), 2)22: #Introduce Eponential Moving Average indicator
23: stock$EMA7=EMA(Cl(stock), n=7, wilder=FALSE, ratio=NULL) 24: stock$EMA50=EMA(Cl(stock), n=50, wilder=FALSE, ratio=NULL) 25: stock$EMA200=EMA(Cl(stock), n=200, wilder=FALSE, ratio=NULL)26: #Introduce MACD indicator
27: stock$MACD26=MACD(Cl(stock), nFast=12, nSlow=26, nSig=9)28: #Introduce ADX indicator
29: stock$ADX14=ADX(stock, n=14)30: #Introduce AROON indicator
31: stock$AROON = aroon(Cl(stock), n=20 )32: #Introduce ATR indicator
33: stock$ATR=ATR(stock, n=14)34: #Introduce BOLLINGER indicator
35: stock$bbands = BBands(Cl(stock))36: #Introduce Commodity Channel Index indicator
37: stock$Commodity = CCI(Cl(stock))38: #Introduce Chaikin Accumulation Distribution indicator
39: #stock$chaikin = chaikinAD(stock, stock$Volume)
40: #Introduce chaikinVolatility Chaikin Volatilit indicator
41: stock$chaikinVolatility = chaikinVolatility(stock)42: #Introduce Close Location Value indicator
43: #stock$clv = CLV(stock) # infinites values
44: #Introduce Chaikin Money Flow indicator
45: #stock$cmf = CMF(stock, stock$Volume)
46: #LAG
47: stock$label=lag(stock$Adjusted,-1)48: #
49: # remove 2003,2004,2005 in order to avoid NaN from EMA indicators
50: # To maintain time it is necessary to conver in texts
51: results <-data.frame(stock[secondDate],TIME=as.character(index(stock[secondDate])))
52: Evaluation Strategy
This module define, using blotter, the strategy described before.
1: # Load required libraries
2: 3: library(quantmod) 4: 5: library(TTR) 6: 7: library(xts) 8: 9: library(FinancialInstrument) 10: 11: library(zoo) 12: 13: library(blotter) 14: 15: library(PerformanceAnalytics) 16: 17: 18: # Try to clean up in case the demo was run previously
19: 20: try(rm("account.rapidminer","portfolio.rapidminer",pos=.blotter),silent=TRUE)
21: 22: try(rm("ltaccount","ltportfolio","ClosePrice","CurrentDate","equity","Symbol","i","initDate","initEq","Posn","UnitSize","verbose"),silent=TRUE)
23: 24: # Set initial values you can change to reduce the test period
25: 26: initDate='2009-01-27'
27: 28: initEq=100000 29: 30: stopLoss=0.005 31: 32: # it is important to introduc this sentence to work using blotter
33: 34: Sys.setenv(TZ="GMT")
35: 36: 37: # Load data with quantmod
38: 39: print("Loading data")
40: 41: currency("USD")
42: 43: stock("Symbol",currency="USD",multiplier=1)
44: 45: 46: # Set up a portfolio object and an account object in blotter
47: 48: print("Initializing portfolio and account structure")
49: 50: ltportfolio='rapidminer'
51: 52: ltaccount='rapidminer'
53: 54: initPortf(ltportfolio,'Symbol', initDate=initDate)
55: 56: initAcct(ltaccount,portfolios='rapidminer', initDate=initDate, initEq=initEq)
57: 58: verbose=TRUE 59: 60: datos <- data # for output data
61: 62: 63: dates <- as.Date(data$TIME)
64: 65: data$TIME=NULL # remove string information
66: 67: data$correct_prediction=NULL # remove information from data
68: 69: Symbol <- xts(data,order.by=as.POSIXct(dates),index.class=c("POSIXt","POSIXct"))
70: 71: 72: PREDICTION = data$prediction 73: 74: close_ROC <- ROC(data$label) 75: 76: prediction_ROC <-ROC(data$prediction) 77: 78: close_ROC[1] <- 0 79: 80: prediction_ROC[1] <- 0 81: 82: #generate signals from prediction values
83: 84: sigup <- ifelse(prediction_ROC > 0, 1, 0) 85: 86: sigdn <- ifelse(prediction_ROC < 0, -1, 0) 87: 88: # Replace missing signals with no position
89: 90: # (generally just at beginning of series)
91: 92: sigup[is.na(sigup)] <- 0 93: 94: sigdn[is.na(sigdn)] <- 0 95: 96: sig <- sigup + sigdn 97: 98: # Create trades
99: 100: for( i in 2:NROW(prediction_ROC) ) {
101: 102: CurrentDate=dates[i] 103: 104: cat(".")
105: 106: equity = getEndEq(ltaccount, CurrentDate) 107: 108: ClosePrice = as.numeric(Symbol$Adjusted[i])
109: 110: OpenPrice = as.numeric(Symbol$Open[i])
111: 112: LowPrice = as.numeric(Symbol$Low[i])
113: 114: HighPrice = as.numeric(Symbol$High[i])
115: 116: filter<-data$filter[i-1] 117: 118: # filter<-1 #if you want to remove filter un-comment this line
119: 120: Posn = getPosQty(ltportfolio, Symbol='Symbol', Date=CurrentDate)
121: 122: UnitSize = as.numeric(trunc(equity/ClosePrice))
123: 124: #print(paste("UnitSize",UnitSize))
125: 126: # Position Entry (assume fill at close)
127: 128: if( Posn == 0 ) {
129: 130: # No position, so start Long position
131: 132: if( prediction_ROC[i-1] >0 && filter==1 ) {
133: 134: 135: StopPrice = OpenPrice *(1-stopLoss) 136: 137: # Store trade with blotter
138: 139: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=OpenPrice, TxnQty = UnitSize , TxnFees=0)
140: 141: 142: 143: if (LowPrice<StopPrice){
144: 145: # Exit
146: 147: print(paste("Exit",CurrentDate))
148: 149: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = -UnitSize , TxnFees=0)
150: 151: 152: } 153: 154: }else{
155: 156: if( prediction_ROC[i-1] <0 && filter==1) {
157: 158: #print("Prediction_ROC is SHELL")
159: 160: 161: StopPrice = OpenPrice *(1+stopLoss) 162: 163: 164: # Store trade with blotter
165: 166: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=OpenPrice, TxnQty = -UnitSize , TxnFees=0)
167: 168: 169: if (HighPrice>StopPrice){
170: 171: # Exit
172: 173: print(paste("Exit",CurrentDate))
174: 175: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = UnitSize , TxnFees=0)
176: 177: } 178: 179: 180: 181: }else{
182: 183: 184: #print("Prediction_ROC is 0")
185: 186: } 187: 188: } 189: 190: }else{
191: 192: if(Posn >0) {
193: 194: # Have a position, so check exit
195: 196: if( prediction_ROC[i-1] < 0 && filter==1) {
197: 198: print("Prediction_ROC is SHELL")
199: 200: 201: 202: StopPrice = OpenPrice *(1+stopLoss) 203: 204: 205: # Store trade with blotter
206: 207: print("close position")
208: 209: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=OpenPrice, TxnQty = -2*abs(Posn) , TxnFees=0)
210: 211: 212: if (HighPrice>StopPrice){
213: 214: # Exit
215: 216: print(paste("Exit",CurrentDate))
217: 218: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = 2*abs(Posn) , TxnFees=0)
219: 220: 221: } 222: 223: }else{
224: 225: 226: if (LowPrice<StopPrice){
227: 228: # Exit
229: 230: print(paste("Exit",CurrentDate))
231: 232: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = -abs(Posn) , TxnFees=0)
233: 234: 235: } 236: 237: 238: } 239: 240: }else{
241: 242: 243: if( prediction_ROC[i-1] > 0 && filter==1) {
244: 245: 246: StopPrice = OpenPrice *(1-stopLoss) 247: 248: 249: # Store trade with blotter
250: 251: #print("close position")
252: 253: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=OpenPrice, TxnQty = 2*abs(Posn) , TxnFees=0)
254: 255: 256: if (LowPrice<StopPrice){
257: 258: # Exit
259: 260: print(paste("Exit",CurrentDate))
261: 262: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = -2*abs(Posn) , TxnFees=0)
263: 264: 265: } 266: 267: }else{
268: 269: # we maintain short position except exit
270: 271: 272: if (HighPrice>StopPrice){
273: 274: # Exit
275: 276: print(paste("Exit",CurrentDate))
277: 278: addTxn(ltportfolio, Symbol='Symbol', TxnDate=CurrentDate, TxnPrice=StopPrice, TxnQty = abs(Posn) , TxnFees=0)
279: 280: 281: } 282: 283: 284: } 285: 286: 287: } 288: 289: } 290: 291: # Calculate P&L and resulting equity with blotter
292: 293: updatePortf(ltportfolio, Dates = CurrentDate,Prices= Symbol$Adjusted) 294: 295: updateAcct(ltaccount, Dates = CurrentDate) 296: 297: updateEndEq(ltaccount, Dates = CurrentDate) 298: 299: } # End dates loop
300: 301: cat('\n')
302: 303: transactions=getTxns(Portfolio="rapidminer", Symbol="Symbol")
304: 305: # Copy the results into the local environment
306: 307: print("Retrieving resulting portfolio")
308: 309: ltportfolio = getPortfolio("rapidminer")
310: 311: print("Retrieving resulting account")
312: 313: ltaccount = getAccount("rapidminer")
314: 315: results <-data.frame(ltaccount$summary) 316: 317: # generate buy and hold
318: 319: buyhold = ROC(Symbol$Adjusted) 320: 321: # Plot Strategy Summary
322: 323: png(filename="20110109_blotter_strategy.png", 1000, 1000, units = "px", pointsize = 12, bg = "white", res = 200, restoreConsole = TRUE)
324: 325: Eq<-ROC(getAccount('rapidminer')$summary$End.Eq)
326: 327: #names(stock)[names(stock)==paste(symbol,'.Low',sep="")] <- 'Low'
328: 329: buyhold$Blackbox <- Eq 330: 331: charts.PerformanceSummary(buyhold,colorset=rich6equal, lwd=2, ylog=TRUE) 332: 333: AnnualizedReturns <- table.AnnualizedReturns(buyhold) 334: 335: Stats <- table.Stats(buyhold) 336: 337: DownsideRisk <- table.DownsideRisk(buyhold) 338: 339: dev.off() 340: It was defined three time interval, one for training (800 days), one for evaluation genetic space search (300 days) and finally the rest of the days for testing ( >200 days)
Several tests have been done using Neural Networks and SVM, obtaining diverse results, however always positive results. You can see in the following image the results obtained
Using neural network we have improved the performance of the strategy in the training and evaluation data, during 7 hours, 100 generations we obtain the following improvement (figure). We use the same technique than in previous post. It is not possible to use for evaluation Blotter due to the time consumption of this package.
The results obtained using this strategy, without filter can be observed in the following graph, the comparison was done with buy&hold strategy. Although backtesting is no guarantee of future performance, it gives the trader confidence that the strategy has worked in the past. If the strategy is not over-optimized, data-mined, or based on random coincidences, it might have a good chance of working in the future.
The Sharpe Ratio (Rf=0%) of this strategy is 2.46.
On the other hand if we use the filter
Using the filter, we reduce the drawdown peak; however we obtained lower return and a Sharpe Ratio of 2.18. Currently I’m doing several test with different filters however the Sharpe Ratio of the prediction algorithm without filter is very high for this index.
Also, you can modify the type of filter, for example you can see the filter obtained using J48 algorithm. The idea is to auto-generate and understand filters using technical indicators.
J48 pruned tree
------------------
RSI2-0 <= 44.156798: false (409.0/168.0)
RSI2-0 > 44.156798
| DX-0 <= 5.179328
| | DX-0 <= 2.667342
| | | tr-1 <= 4.81: false (10.33/2.33)
| | | tr-1 > 4.81: true (20.67/3.0)
| | DX-0 > 2.667342: true (24.0/1.0)
| DX-0 > 5.179328: true (536.0/255.0)
Number of Leaves : 5
Size of the tree : 9
The current problems of the model are:
- The accuracy of the long and short position prediction, it is not good should be improved.
- My objective is to reduce the risk of the strategy, so the the filter should be improved in order to reduce false signals.
- Reduce the number of variable to adjust for the strategy…
In the following post, I will describe in a video the model with all elements of the architecture, and I will try to discuss about the problems discovered in the strategy.

We do not guarantee that those strategies will give you profits or have mistakes. It is important to remember that each trading situation is unique. You can never copy a strategy and think that it will always work. It is allot factors to take consideration too. For example, how the game looks like, how the odds are moving etc. This blog is a research activity to share knowledge related to datamining and trading algorithms. All code described in this blog is GPL v3, We try to put all information about the model and how to generate the results obtained and also if you have any doubt about the model you will receive the answer by email or comments in the blog. If you don¡t have patient and you want to obtain the files you can, paying one small donation to improve the website,
| Opinions: |
Subscribe to:
Posts (Atom)
My Favorites blogs
-
Visualizing car brand choices in ggplot2 - I always like to read new posts at chartsnthings as they always inspire me with new ideas for data visualization. Yesterday I have read an article on choic...3 hours ago
-
Links for May 22nd through May 25th - These are my links for May 22nd through May 25th: Facebook’s stock debut shows not all investors are equal – latimes.com – "Market theory is very simple," ...1 week ago
-
Predictive Analytics World Had the Target Story First - The New York Times Magazine article "How Companies Learn Your Secrets" by Charles Duhigg with the key descriptions of Target, pregnancy, predictive analyti...4 weeks ago
-
Long-Overdue Blogroll Update - I don't think I've updated my blogroll for at least a year... shame on me. This update is mostly additions. I only removed Max Dama's blog, and that was o...1 month ago
-
Genetic Crossover in MOSES - MOSES is a system for learning programs from input data. Given a table of input values, and a column of outputs, MOSES tries to learn a program, the simpl...2 months ago
-
New Website Software Stack - I switched to WordPress since Blogger never upgrades their service: maxdama.com1 year ago
Tags
R
(6)
General
(5)
rapidminer
(5)
strategies
(4)
trading
(4)
tutorial
(3)
Algorithms
(1)
books
(1)
tools
(1)
video
(1)
