November 18, 2010

Rapidminer + R Example for Trading

RapidMiner + R is an advanced tool that can be used to analyze trading strategies, In order to check its power I made a simple example using an algorithm based on a support vector machine for predicting the next day's price and based on it I generated buying and selling signals. I have integrated quant indicators, SVM, and inally the strategy is evaluated.
The requirements needed to build the model are, of course, RapidMiner, Weka extension, time series extension and the R extension. This requires installing R with quantmod, TTR and PerformanceAnalytics packages. There is a thread to solve any problem here
To be able to reproduce my results I will detail each of the modules of the following figure:
flow
1. R Process.
The objective is to process data from Yahoo finance and build the most common indicators to add to the series, these indicators have been taken considering the following article.To this end, here is a new paper written by an engineering student at UC Berkeley which uses "support vector machine" together with 10 simple technical indicators to predict the SPX index, purportedly with 60% accuracy
The content of the process is detailled here:
 
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# pull IBM data from Yahoo Finance
getSymbols("IBM",from="2003-01-01")
# Introduce RSI Indicator
IBM$RSI2 = RSI(Cl(IBM), 2)
#Introduce Eponential Moving Average indicator
IBM$EMA7=EMA(Cl(IBM), n=7, wilder=FALSE, ratio=NULL)
IBM$EMA50=EMA(Cl(IBM), n=50, wilder=FALSE, ratio=NULL)
IBM$EMA200=EMA(Cl(IBM), n=200, wilder=FALSE, ratio=NULL)
#Introduce MACD indicator
IBM$MACD26=MACD(Cl(IBM), nFast=12, nSlow=26, nSig=9)
#Introduce ADX indicator
IBM$ADX14=ADX(IBM, n=14)
#results <-transform(IBM,RSI.IBM=RSI(Cl(IBM), 2),RETURN=ret ,TIME=as.character(index(IBM)))
# remove 2003,2004,2005 in order to avoid NaN from EMA indicators
# To maintain time it is necessary to conver in texts
results <-transform(IBM["2006-01-01::2009-01-01"],TIME=as.character(index(IBM["2006-01-01::2009-01-01"])))
 
***********************************************
 
The output of the system is:
clip_image004
2. String to Time (Nominal to Date)
We convert date string to Date.
stringtotime
3. Close adjuste to Label
We put label the IBM adjusted close value in order to predict one day in advance..
ibmlabel
4. set Time to ID (Set Role)

 
We use the TIME as ID for time serie data.
5. Widowing
We move one day in the future the variable to predict and add 2 new columns with lagged values in a time window of 2 days.
widowing
6. % sliding Window Validation
sliding
Time series validation
validation
We use the Support Vector Machine Weka implementation
validation2
You can improve the accuracy of the prediction algorithm using any parameter optimizer or attribute selection.
Now Validation process
7.. Obtain Technical Test data
This module is similar to the first one except we use evaluation data from the last year
 
 
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# pull IBM data from Yahoo Finance
getSymbols("IBM",from="2009-01-01")
# Introduce RSI Indicator
IBM$RSI2 = RSI(Cl(IBM), 2)
#Introduce Eponential Moving Average indicator
IBM$EMA7=EMA(Cl(IBM), n=7, wilder=FALSE, ratio=NULL)
IBM$EMA50=EMA(Cl(IBM), n=50, wilder=FALSE, ratio=NULL)
IBM$EMA200=EMA(Cl(IBM), n=200, wilder=FALSE, ratio=NULL)
#Introduce MACD indicator
IBM$MACD26=MACD(Cl(IBM), nFast=12, nSlow=26, nSig=9)
#Introduce ADX indicator
IBM$ADX14=ADX(IBM, n=14)
#results <-transform(IBM,RSI.IBM=RSI(Cl(IBM), 2),RETURN=ret ,TIME=as.character(index(IBM)))
# remove 2009 in order to avoid NaN from EMA indicators 2010 evaluation
# To maintain time it is necessary to conver in texts
results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"])))
.
 
***********************************************

We use a similar pre-process Flow..
clip_image020
11. Apply Model
We will apply the model obtained before
And finally we analyze the trading strategy results
12. Prediction Lable as Regular (Set Role)
It is modified the predicted label to use inside R process.
13. Date to Nominal
It is modified the date to nominal to use it in R process.
14. Set TIME as Regular (Set Role)
It is modified the TIME attributte as a regular to use it in R process..
15. Set TIME as Regular (Set Role)
This script is inspired in FOSS trading code.
 
 
***********************************************
library(quantmod)
library(TTR)
library(PerformanceAnalytics)
# 31 prediction close_ROCel
# 33 close_ROCel
close_ROC <- ROC(data[33])
dates = as.Date(data$TIME)
prediction_ROC <-ROC(data[31])
close_ROC[1] <- 0
prediction_ROC[1] <- 0
#generate signals from prediction values
sigup <- ifelse(prediction_ROC > 0, 1, 0)
sigdn <- ifelse(prediction_ROC < 0, -1, 0)
# Replace missing signals with no position
# (generally just at beginning of series)
sigup[is.na(sigup)] <- 0
sigdn[is.na(sigdn)] <- 0
sig <- sigup + sigdn
# Calculate equity curves
eq_up <- cumprod(1+close_ROC*sigup)
eq_dn <- cumprod(1+close_ROC*sigdn)
eq_all <- cumprod(1+close_ROC*sig)
# obtain result
result <-transform(data,sig=sig ,ret=close_ROC, eq_up=eq_up, eq_dn=eq_dn, eq_all=eq_all)
# This function gives us some standard summary
# statistics for our trades.
tradeStats <- function(signals, returns) {
# Inputs:
# signals : trading signals
# returns : returns corresponding to signals
# Combine data and convert to data.frame
sysRet <- signals * returns * 100
posRet <- sysRet > 0 # Positive rule returns
negRet <- sysRet < 0 # Negative rule returns
dat <- cbind(signals,posRet*100,sysRet[posRet],sysRet[negRet],1)
dat <- as.data.frame(dat)
# Aggreate data for summary statistics
means <- aggregate(dat[,2:4], by=list(dat[,1]), mean, na.rm=TRUE)
medians <- aggregate(dat[,3:4], by=list(dat[,1]), median, na.rm=TRUE)
sums <- aggregate(dat[,5], by=list(dat[,1]), sum)
colnames(means) <- c("Signal","% Win","Mean Win","Mean Loss")
colnames(medians) <- c("Signal","Median Win","Median Loss")
colnames(sums) <- c("Signal","# Trades")
all <- merge(sums,means)
all <- merge(all,medians)
wl <- cbind( abs(all[,"Mean Win"]/all[,"Mean Loss"]),
abs(all[,"Median Win"]/all[,"Median Loss"]) )
colnames(wl) <- c("Mean W/L","Median W/L")
all <- cbind(all,wl)
return(all)
}
# trade stats
stats<- as.data.frame(tradeStats(sig,close_ROC))
ret_all<-close_ROC
xts.ts <- xts(ret_all,dates)
drawdownrport = table.Drawdowns(xts.ts)
 
***********************************************

In the following graph you can see the not well ROC of this strategy

Return obtained during buy and shell signals
clip_image024
This strategy is a simplification, and that should be understand as a proof of concept.
All information is in this tutorial, however if you want to clip_image025 an small quantity of money to improve this web you can obtain the files here.

16 comments:

  1. What a fantastic application of Rapidminer and R, awesome job. I'll have to try this myself.

    Regards,
    Tom

    ReplyDelete
  2. thanks, if you have any doubt please contact me

    ReplyDelete
  3. Thanks to Jazef in neuralmarkettrends coments I have found a bug. In the last code I use 13 and 15 index of the rock, however the correct are 31 and 33 as, now files and post is corrected

    The correct sentence is
    # 31 prediction close_ROCel
    # 33 close_ROCel
    close_ROC <- ROC(data[33])
    dates = as.Date(data$TIME)
    prediction_ROC <-ROC(data[31])
    I will publish in the blog the modification

    ReplyDelete
  4. Very nice post! I'm glad you've found quantmod, TTR, xts, PerformanceAnalytics, et al to be useful. Please don't hesitate to let us know if you have suggestions for improvements.

    I'm also happy you found my tradeStats() function helpful. I hope to put it into one of the above packages in the future.

    ReplyDelete
  5. Thanks Joshua for your comment, Quantmod, TTR... are awesome, I hope in the future I can help you to improve them

    ReplyDelete
  6. Hi

    I believe it's incorrect to introduce ADX this way:

    IBM$ADX14=ADX(IBM, n=14)

    ADX calculate it's value based on index, and they rely on correct order Hi-Lo-Cl, while in IBM you have Op-Hi-Lo-Cl, so you have to reorder a bit:
    IBM$ADX14 = ADX(IBM[,2:4], n=14)

    ReplyDelete
  7. I'm not sure, i have doubts now, if you see the XTS documentation

    "HLC Object that is coercible to xts or matrix and contains High-Low-Close prices."

    In that case I use a xts object so the function find the correct values from xts object... I will make an small example to see if it is correct.

    ReplyDelete
  8. Hi everyone, I'm really really new at so forgive me if this seems like a dumb question, I just downloaded rapidminer 5.1 and in the updates it allows me to install the Weka Extension, Series Processing, the R Extension and Web Mining, I was going to install them but when I was reading the Licenses, they indicate that the term Free applies to Freedom and not Price, does this cost any money?

    ReplyDelete
  9. Hi, Rapidminer is Open Source however is not really free for Commercial applications (there is commercial version), it use a AGPL license, so you can use it free if you give your code (or your model), for example if you use Rapidminer to generate a service (a application in the web) for trading and you sell the service, you have to give to your clients all code generated. Weka is similar, use similar type of license in weka case GPL v1.0 (less restrictive for services in the cloud) I'm not a lawyer, in resume both systems are free to use, and you have to study the license if you want to develop an application using WEKA or Rapidminer.

    WEKA license GPL license http://www.gnu.org/licenses/old-licenses/gpl-2.0.html
    Rapidminer License APL license http://es.wikipedia.org/wiki/GNU_Affero_General_Public_License

    ReplyDelete
  10. hello Physicist, great blog post! I am trying to replicate your post, but the transform command to convert time into text at the end of the first R script doesn't create any TIME object for me. Is this a common issue? Thanks

    Oliver

    ReplyDelete
  11. Hello Olivier

    the error is in

    results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"])))

    what version of R are you using, Time objects are not correctly converted in Rapidminer so I have to conver to string in R and then transfor string to TIME in Rapidminer

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Hi Physicist :)

    I ran into the problem with Time conversion. I tried: results <-transform(IBM["2010-01-01::"],TIME=as.character(index(IBM["2010-01-01::"]))) and faced with Process Failed.

    I was running R 2.14.1 and Rapidminer 5. Thank you.

    ReplyDelete
  14. I see the bug, however I'm not sure what is the problem. I tested it with R2.13 and R.2.13.1 and it is working, however in R2.14 it is not working, I'm looking for a round over solution

    ReplyDelete
  15. Hi,

    I tried to see if I could use the 'uploadnsell' thing to get the source, however it doesn't seem to be available. Is it permanently broken?

    ReplyDelete
  16. Hi,

    I am using a 1000 example set, 500 training window and 500 testing window (step 1, horizon 1). Why does the validation only run 3 examples and stop?

    Shouldn't it run 500 times?

    ReplyDelete