Home Trading system Design Testing Designing a Trading System: The Backtester and the Data

Designing a Trading System: The Backtester and the Data


The Environment

Developing a trading system nowadays involves some kind of developing platform able to backtest and optimise the parameters of the strategy in hand.

There are three kinds of strategy testers: dedicated testers, that comes with the trading platform,  general purpose, custom-made software solutions created in a high-level programming environment such as Python, R, or MatLab,  and On-line solutions, offering specific programming packages and data such as Quantiacs.com.


Metatrader is the most popular platform for the FX markets. It uses a C++ variant called MetaQuotes Language MQL. The current version is MT-5, although MT-4 is still the most used.

The following are the key features of the MQL-5 language:

  • C++ syntax
  • Operating speed close to that achievable with C++
  • Wide range of built-in features for creating technical indicators
  • Open-Cl support for fast parallel executions for optimisation tasks without the need to write parallel code.
  • Wide variety of free code for indicators and strategies with a strong mlq4 and mlq5 community.


Python is a terrific high-level programming language with tons of features and a huge number of libraries for anything you could imagine: database, data science, statistics, and machine learning, including logistic regression to neural networks.

Python has a specific library to backtest and optimize MT4 libraries and expert advisors.

Python can also be bridged to MetaTrader using ZeroMQ free software distributed messaging. (http://zeromq.org/intro:read-the-manual)



Fig 2 – A Jupyter Notebook (click to enlarge)

Python enables the implementation of strategies different from those developed by typical technical analysis, although there are technical analysis libraries in python to develop technical-based strategies. As an example, Python makes it easy to create stats-based pivots points or strategies could match volatility and price extension to decide statistically relevant dynamically-computed price targets and stops.

The implementation of Monte Carlo backtesting takes a few lines of code. Also, there are machine learning packages, including parallel GPU-enabled neural network libraries, such as TensorFlow and PyTorch.

There is a large community of quants developing stats-based strategies in Python, and lots of free code to start from, so ZeroMQ is a serious alternative to direct design in MQL.

One of the most extended python packages is Anaconda Distribution. Anaconda incorporates Jupyter Notebooks, an interactive python web-based environment that allows the development of python apps as if it were a notebook (Fig. 2). Anaconda includes, also, an integrated software development IDE called Spyder (fig. 3)

Anaconda distribution may be found at https://www.anaconda.com/download/

Fig 3 – The Spyder IDE


Forex historical data bars can be acquired for free from the broker for major pairs crosses, and exotic pairs, with more than 5 years of minute, via MT’s the historical data centre (F2). Starting from the minute bars, other timeframes can be recreated easily

Tick and sub-minute data aren’t available for free, so we should consider if the idea needs data resolutions beyond the minute.

Forex traders are fortunate in that their data is already continuous through time. Futures trading needs to combine several contracts on a continuous contract because futures contracts expire every three months.

The problem is that the expiring contract prices are slightly different from the starting prices of the new contract because future prices are affected by interest rates since its value is computed taking into account the cost of the money, and this cost varies as the distance to its expiration approaches.

If you need to build your own continuous contract, there are several ways, but, in my opinion, the best method is to convert prices to ratios using the formula:

Price change = Closei– closei-1 / closei-1

Then go back to the first price of the current contract and perform the conversion back from there on the historical data series.

Data Cleansing

Historical data are rarely perfect. Spurious prices are more common than most think. Also, it may be desirable to get rid of price spikes that, although correct, are so unusual our system should ignore them. Therefore, it may be helpful to include a data cleansing routine that takes away unwanted large spikes.

One way to do it is to check each bar in the price history and mark as erroneous any bar If the ratio of its close to the previous close is less than a specified fraction, or higher than the specified fraction’s reciprocal. All marked erroneous dates should not be used to compute any variable, indicator or target, or can be replaced with the mean value of the two neighbouring bars.

Data Normalisation

The typical way to design a trading system does not involve any price normalisation or adjustment, besides what is needed to create a continuous contract in the futures markets.

There are two kinds of technical studies, those whose actual value at one bar is of crucial importance in and of itself, and those whose influence is based on their current value relative to recent values. As examples of the first category, we may consider PSAR, ATR, Moving averages, linear regression, and pivot points.  Examples of the second category are stochastics, Williams %R, MACD, and RSI.

The reason to adjust the current value of an indicator to recent values is to force the maximum possible degree of stationarity on it. Stationarity is a very desirable statistical property that improves the predicting accuracy of a technical study.

There are two types of price adjustment: Centering and scaling. Centring subtracts the historical median from the indicator. Scaling divides the indicator by its interquartile range.


There are several ways to achieve centring. One of them is to apply a detrending filter to the historical price series. The other way is to subtract the median of some bar quantity, for example, 100 to 200 bars.


Sometimes centring the variable may destroy important information, but, we may want to compensate for shifting volatility. A volatility value that is “large” in one period may be “medium” or even “short” in another period, thus, we may want to divide the value of the variable by a measure of its recent value range. The interquartile range is an ideal measure of variation because it’s not affected by outliers as a classical standard deviation would be.

The formula to do that is:

Scaled_value = 100 * CDF [ 0.25 * X/(P75– P25) ] -50

Where CDF is the standard normal Cumulative Density Function

X is the unscaled current value

P75and P25are, respectively, the 75 and 25 percentile of the historical values of the indicator.




Building Winning Algorithmic Trading Systems, Kevin J. Davey

Computer Analysis of the Futures Markets, Charles LeBeau, George Lucas

Statistically sound Machine Learning for Algorithmic Trading of Financial Instruments, Aronson and Masters


Please enter your comment!
Please enter your name here