The requirements needed to build the model are, of course, RapidMiner, Weka extension, time series extension and the R extension. This requires installing R with quantmod, TTR and PerformanceAnalytics packages. There is a thread to solve any problem here
To be able to reproduce my results I will detail each of the modules of the following figure:

1. R Process.
The objective is to process data from Yahoo finance and build the most common indicators to add to the series, these indicators have been taken considering the following article.To this end, here is a new paper written by an engineering student at UC Berkeley which uses "support vector machine" together with 10 simple technical indicators to predict the SPX index, purportedly with 60% accuracy
The content of the process is detailled here:
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# pull IBM data from Yahoo Finance
getSymbols("IBM",from="2003-01-01")
# Introduce RSI Indicator
IBM$RSI2 = RSI(Cl(IBM), 2)
#Introduce Eponential Moving Average indicator
IBM$EMA7=EMA(Cl(IBM), n=7, wilder=FALSE, ratio=NULL)
IBM$EMA50=EMA(Cl(IBM), n=50, wilder=FALSE, ratio=NULL)
IBM$EMA200=EMA(Cl(IBM), n=200, wilder=FALSE, ratio=NULL)
#Introduce MACD indicator
IBM$MACD26=MACD(Cl(IBM), nFast=12, nSlow=26, nSig=9)
#Introduce ADX indicator
IBM$ADX14=ADX(IBM, n=14)
#results <-transform(IBM,RSI.IBM=RSI(Cl(IBM), 2),RETURN=ret ,TIME=as.character(index(IBM)))
# remove 2003,2004,2005 in order to avoid NaN from EMA indicators
# To maintain time it is necessary to conver in texts
results <-transform(IBM["2006-01-01::2009-01-01"],TIME=as.character(index(IBM["2006-01-01::2009-01-01"])))
***********************************************
The output of the system is:

2. String to Time (Nominal to Date)
We convert date string to Date.

3. Close adjuste to Label
We put label the IBM adjusted close value in order to predict one day in advance..

4. set Time to ID (Set Role)
We use the TIME as ID for time serie data.
5. Widowing
We move one day in the future the variable to predict and add 2 new columns with lagged values in a time window of 2 days.

6. % sliding Window Validation

Time series validation

We use the Support Vector Machine Weka implementation

You can improve the accuracy of the prediction algorithm using any parameter optimizer or attribute selection.
Now Validation process
7.. Obtain Technical Test data
This module is similar to the first one except we use evaluation data from the last year
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# pull IBM data from Yahoo Finance
getSymbols("IBM",from="2009-01-01")
# Introduce RSI Indicator
IBM$RSI2 = RSI(Cl(IBM), 2)
#Introduce Eponential Moving Average indicator
IBM$EMA7=EMA(Cl(IBM), n=7, wilder=FALSE, ratio=NULL)
IBM$EMA50=EMA(Cl(IBM), n=50, wilder=FALSE, ratio=NULL)
IBM$EMA200=EMA(Cl(IBM), n=200, wilder=FALSE, ratio=NULL)
#Introduce MACD indicator
IBM$MACD26=MACD(Cl(IBM), nFast=12, nSlow=26, nSig=9)
#Introduce ADX indicator
IBM$ADX14=ADX(IBM, n=14)
#results <-transform(IBM,RSI.IBM=RSI(Cl(IBM), 2),RETURN=ret ,TIME=as.character(index(IBM)))
# remove 2009 in order to avoid NaN from EMA indicators 2010 evaluation
# To maintain time it is necessary to conver in texts
results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"])))
.
***********************************************
We use a similar pre-process Flow..

11. Apply Model
We will apply the model obtained before
And finally we analyze the trading strategy results
12. Prediction Lable as Regular (Set Role)
It is modified the predicted label to use inside R process.
13. Date to Nominal
It is modified the date to nominal to use it in R process.
14. Set TIME as Regular (Set Role)
It is modified the TIME attributte as a regular to use it in R process..
15. Set TIME as Regular (Set Role)
This script is inspired in FOSS trading code.
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# 31 prediction close_ROCel
# 33 close_ROCel
close_ROC <- ROC(data[33])
dates = as.Date(data$TIME)
prediction_ROC <-ROC(data[31])
close_ROC[1] <- 0
prediction_ROC[1] <- 0
#generate signals from prediction values
sigup <- ifelse(prediction_ROC > 0, 1, 0)
sigdn <- ifelse(prediction_ROC < 0, -1, 0)
# Replace missing signals with no position
# (generally just at beginning of series)
sigup[is.na(sigup)] <- 0
sigdn[is.na(sigdn)] <- 0
sig <- sigup + sigdn
# Calculate equity curves
eq_up <- cumprod(1+close_ROC*sigup)
eq_dn <- cumprod(1+close_ROC*sigdn)
eq_all <- cumprod(1+close_ROC*sig)
# obtain result
result <-transform(data,sig=sig ,ret=close_ROC, eq_up=eq_up, eq_dn=eq_dn, eq_all=eq_all)
# This function gives us some standard summary
# statistics for our trades.
tradeStats <- function(signals, returns) {
# Inputs:
# signals : trading signals
# returns : returns corresponding to signals
# Combine data and convert to data.frame
sysRet <- signals * returns * 100
posRet <- sysRet > 0 # Positive rule returns
negRet <- sysRet < 0 # Negative rule returns
dat <- cbind(signals,posRet*100,sysRet[posRet],sysRet[negRet],1)
dat <- as.data.frame(dat)
# Aggreate data for summary statistics
means <- aggregate(dat[,2:4], by=list(dat[,1]), mean, na.rm=TRUE)
medians <- aggregate(dat[,3:4], by=list(dat[,1]), median, na.rm=TRUE)
sums <- aggregate(dat[,5], by=list(dat[,1]), sum)
colnames(means) <- c("Signal","% Win","Mean Win","Mean Loss")
colnames(medians) <- c("Signal","Median Win","Median Loss")
colnames(sums) <- c("Signal","# Trades")
all <- merge(sums,means)
all <- merge(all,medians)
wl <- cbind( abs(all[,"Mean Win"]/all[,"Mean Loss"]),
abs(all[,"Median Win"]/all[,"Median Loss"]) )
colnames(wl) <- c("Mean W/L","Median W/L")
all <- cbind(all,wl)
return(all)
}
# trade stats
stats<- as.data.frame(tradeStats(sig,close_ROC))
ret_all<-close_ROC
xts.ts <- xts(ret_all,dates)
drawdownrport = table.Drawdowns(xts.ts)
***********************************************
In the following graph you can see the not well ROC of this strategy
Return obtained during buy and shell signals

This strategy is a simplification, and that should be understand as a proof of concept.
All information is in this tutorial, however if you want to

What a fantastic application of Rapidminer and R, awesome job. I'll have to try this myself.
ReplyDeleteRegards,
Tom
thanks, if you have any doubt please contact me
ReplyDeleteThanks to Jazef in neuralmarkettrends coments I have found a bug. In the last code I use 13 and 15 index of the rock, however the correct are 31 and 33 as, now files and post is corrected
ReplyDeleteThe correct sentence is
# 31 prediction close_ROCel
# 33 close_ROCel
close_ROC <- ROC(data[33])
dates = as.Date(data$TIME)
prediction_ROC <-ROC(data[31])
I will publish in the blog the modification
Very nice post! I'm glad you've found quantmod, TTR, xts, PerformanceAnalytics, et al to be useful. Please don't hesitate to let us know if you have suggestions for improvements.
ReplyDeleteI'm also happy you found my tradeStats() function helpful. I hope to put it into one of the above packages in the future.
Thanks Joshua for your comment, Quantmod, TTR... are awesome, I hope in the future I can help you to improve them
ReplyDeleteHi
ReplyDeleteI believe it's incorrect to introduce ADX this way:
IBM$ADX14=ADX(IBM, n=14)
ADX calculate it's value based on index, and they rely on correct order Hi-Lo-Cl, while in IBM you have Op-Hi-Lo-Cl, so you have to reorder a bit:
IBM$ADX14 = ADX(IBM[,2:4], n=14)
I'm not sure, i have doubts now, if you see the XTS documentation
ReplyDelete"HLC Object that is coercible to xts or matrix and contains High-Low-Close prices."
In that case I use a xts object so the function find the correct values from xts object... I will make an small example to see if it is correct.
Hi everyone, I'm really really new at so forgive me if this seems like a dumb question, I just downloaded rapidminer 5.1 and in the updates it allows me to install the Weka Extension, Series Processing, the R Extension and Web Mining, I was going to install them but when I was reading the Licenses, they indicate that the term Free applies to Freedom and not Price, does this cost any money?
ReplyDeleteHi, Rapidminer is Open Source however is not really free for Commercial applications (there is commercial version), it use a AGPL license, so you can use it free if you give your code (or your model), for example if you use Rapidminer to generate a service (a application in the web) for trading and you sell the service, you have to give to your clients all code generated. Weka is similar, use similar type of license in weka case GPL v1.0 (less restrictive for services in the cloud) I'm not a lawyer, in resume both systems are free to use, and you have to study the license if you want to develop an application using WEKA or Rapidminer.
ReplyDeleteWEKA license GPL license http://www.gnu.org/licenses/old-licenses/gpl-2.0.html
Rapidminer License APL license http://es.wikipedia.org/wiki/GNU_Affero_General_Public_License
hello Physicist, great blog post! I am trying to replicate your post, but the transform command to convert time into text at the end of the first R script doesn't create any TIME object for me. Is this a common issue? Thanks
ReplyDeleteOliver
Hello Olivier
ReplyDeletethe error is in
results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"])))
what version of R are you using, Time objects are not correctly converted in Rapidminer so I have to conver to string in R and then transfor string to TIME in Rapidminer
This comment has been removed by the author.
ReplyDeleteHi Physicist :)
ReplyDeleteI ran into the problem with Time conversion. I tried: results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"]))) and faced with Process Failed.
I was running R 2.14.1 and Rapidminer 5. Thank you.
I see the bug, however I'm not sure what is the problem. I tested it with R2.13 and R.2.13.1 and it is working, however in R2.14 it is not working, I'm looking for a round over solution
ReplyDeleteHi,
ReplyDeleteI tried to see if I could use the 'uploadnsell' thing to get the source, however it doesn't seem to be available. Is it permanently broken?
Hi,
ReplyDeleteI am using a 1000 example set, 500 training window and 500 testing window (step 1, horizon 1). Why does the validation only run 3 examples and stop?
Shouldn't it run 500 times?