Friday, February 02, 2018

FX Order Flow as a Predictor

Order flow is signed trade size, and it has long been known to be predictive of future price changes. (See Lyons, 2001, or Chan, 2017.) The problem, however, is that it is often quite difficult or expensive to obtain such data, whether historical or live. This is especially true for foreign exchange transactions which occur over-the-counter. Recognizing the profit potential of such data, most FX market operators guard them as their crown jewels, never to be revealed to customers. But recently FXCM, a FX broker, has kindly provided me with their proprietary data, and I have made use of that to test a simple trading strategy using order flow on EURUSD.

First, let us examine some general characteristics of the data. It captures all trades transacted on FXCM occurring in 2017, time stamped in milliseconds, and with their trade prices and signed trade sizes. The sign of a trade is positive if it is the result of a buy market order, and negative if it is the result of a sell. If we take the absolute value of these trade sizes and sum them over hourly intervals, we obtain the usual hourly volumes (click to enlarge) aggregated over the 1 year data set:



















It is not surprising that the highest volume occurs between 16:00-17:00 London time, as 16:00 is when the benchmark rate (the "fix") is determined. The secondary peak at 9:00-10:00 is of course the start of the business day in London.

Next, I compute the daily total order flow of EURUSD (with the end of day at New York's midnight), and I establish a histogram of the last 20 days' daily order flow. I then determine the average next-day return of each daily order flow quintile. (I.e. I bin a next-day return based on which quintile the prior day's order flow fell into, and then take the average of the returns in each bin.) The result is satisfying:


We can see that the average next-day returns are almost monotonically increasing with the previous day's order flow. The spread between the top and bottom quintiles is about 12 bps, which annualizes to about 30%. This doesn't mean we will generate 30% annualized returns, since we won't be able to arbitrage between today's return (if the order flow is in the top or bottom quintile) with some previous day's return when its order flow was in the opposite extreme. Nevertheless, it is encouraging. Also, this is an illustration that even though order flow must be computed on a tick-by-tick basis (I am not a fan of the bulk volume classification technique), it can be used in low-frequency trading strategies.

(One may be tempted to also regress future returns against past order flows, but the result is statistically insignificant. Apparently only the top and bottom quintiles of order flow are predictive. This situation is actually quite common in finance, which is why linear regression isn't used more often in trading strategies.)

Finally, one more sanity check before backtesting. I want to see if the buy trades (trades resulting from buy market orders) are filled above the bid price, and the sell trades are filled below the ask price. Here is the plot for one day (times are in New York):



















We can see that by and large, the relationship between trade and quote prices is satisfied. We can't really expect that this relationship holds 100%, due to rare occasions that the quote has moved in the sub-millisecond after the trade occurred and the change is reported as synchronous with the trade, or when there is a delay in the reporting of either a trade or a quote change.

So now we are ready to construct a simple trading strategy that uses order flow as a predictor. We can simply buy EURUSD at the end of day when the daily flow is in the top quintile among its last 20 days' values, and hold for one day, and short it when it is in the bottom quintile. Since our daily flow was measured at midnight New York time, we also define the end of day at that time. (Similar results are obtained if we use London or Zurich's midnight, which suggests we can stagger our positions.) In my backtest, I have subtracted 0.20 bps commissions (based on Interactive Brokers), and I assume I buy at the ask and sell at the bid using market orders. The equity curve is shown below:



















The CAGR is 13.7%, with a Sharpe ratio of 1.6. Not bad for a single factor model!

Acknowledgement:  I thank Zachary David for his review and comments on an earlier draft of this post, and of course FXCM for providing their data for this research.

===

Industry update

1) Qcaid is a cloud-based platform that provides traders with backtesting, execution, and simulation facilities. They also provide servers and data feed.

2) How Cadre Uses Machine Learning to Target Real Estate Markets.

3) Check out Quantopian's new tutorial on getting started in quantitative finance.

4) A new Matlab-based backtest and live trading platform for download here.

5) A nice resource page for open source algorithmic trading tools at QuantNews.

My Upcoming Workshops

February 24 and March 3: Algorithmic Options Strategies

This online course focuses on backtesting intraday and portfolio option strategies. No pesky options pricing theories will be discussed, as the emphasis is on arbitrage trading.

June 4-8: London workshops

These intense 8-16 hours workshops cover Algorithmic Options Strategies, Quantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits. (Bonus: nice view of the Thames, and lots of free food.)


Thursday, January 04, 2018

A novel capital booster: Sports Arbitrage


By Stephen Hope

As traders, we of course need money to make money, but not everyone has 10-50k of capital lying around to start one's trading journey. Perhaps the starting capital is only 1k or less. This article describes how one can take a small amount of capital and multiply it as much as 10 fold in one year by taking advantage of large market inefficiencies (leading to arbitrage opportunities) in the sports asset class. However, impressive returns such as this are difficult to achieve with significantly larger seed capital, as discussed later.

Arbitrage is the perfect trade if you can get your hands on one, but clearly this is exceptionally difficult in the financial markets. In contrast, the sports markets are very inefficient due to the general lack of trading APIs and patchy liquidity etc. Arbitrages can persist for minutes (or even hours at a time).

Consider a very simple example of sports arbitrage; Team A vs Team B and three bookmakers quoting the odds shown in the table below. When the odds are expressed in decimal form we can calculate the implied probability of the event  e occurring as quoted by bookmaker i as  P(i,e) = 1/Odds(i,e)  (shown in brackets in the table).

Three Way Market
Bookmaker B1
Bookmaker B2
Bookmaker B3
Team A win
1.4 (71.4%)
1.2 (83.3%)
1.2 (83.3%)
Team A lose
8.8 (11.4%)
9.5 (10.5%)
9.1 (11.0%)
Draw
5.8 (17.2%)
6.0 (16.7%)
6.8 (14.7%)

In the Three Way Market, there are only 3 possible outcomes; Team A wins, Team A loses or it's a draw. Therefore the sum of the probabilities of these 3 events should equal 100% (in a fair market). However, we can see that the market is not efficient and the combination of odds shown in red give; 




This is an arbitrage opportunity in the Three Way market with 3 legs;

1_2_X and Odds = (1.4, 9.5, 6.8)

where

1 = Three Way Market (home team to win)
2 = Three Way Market (away team to win)
X = Three Way Market (a draw)

The size of the arbitrage is given by 





and in order to realise this arbitrage we need to bet the following percentage stakes against our notional








The above example is a 'simple' arbitrage. However, the majority of football arbitrage opportunities are 'complex' arbitrages. Complex in the sense that the bet legs are not mutually exclusive and more than one leg can pay out over some overlapping subset of possible outcomes. The calculation then becomes more complex. 

For example, consider the following 3 leg complex arbitrage;

AH2(-0.25)_X1_1 and Odds = (1.69, 2.1, 5.25);

where

AH2(-0.25) = Asian Handicap Market (away team to win, handicap -0.25) 
X1 = Double Chance Market (home team to win or draw)
1 = Three Way Market (home team to win)

We can construct a payoff matrix to more easily visualise the outcome dependent payoffs of the 3 bet legs.

Payoff Matrix
Away Team Wins
Draw
Home Team Wins
AH2(-0.25)
0.69
-0.5
-1
X1
-1
1.1
1.1
1
-1
-1
4.25


Matrix Element Meanings
0.69 –> win 0.69 * stake 1 (+ stake 1 returned)
1.1 –> win 1.1 * stake 2 (+ stake 2 returned)
4.25 –> win 4.25 * stake 3 (+ stake 3 returned)
-0.5 –> lose -0.5 * stake 1 (get half of stake 1 back)
-1 –> lose -1 * stake i (lose your full stake)

The structure of the Payoff Matrix reveals a 'potential' arbitrage because there exists no column (event outcome) that contains only negative cash flows. It is a potential 'complex arbitrage' because in the event of a draw or home team win, there exists two bet legs that can give rise to a positive cash flow for the same outcome (remember, -0.5 means half of the stake is returned so is still positive). However, whether or not the arbitrage can be 'realised' depends on whether or not we can find a solution for the stake percentages for each leg that gives a positive net profit for every outcome. So how do we do this ?

Constructed as a dynamic programming optimisation we have;

















where

x = ( x1 , x2 , x3 ... ) are the bet leg stakes
C is a payoff matrix column chosen to maximise
A is the constraints matrix (e.g sum of stakes = 1, stake (i) >= 0 etc)

Solving the optimisation for the AH2(-0.25)_X1_1 example above gives;

Payoff Matrix
Away Team Wins
Draw
Home Team Wins
Stake %
AH2(-0.25)
101.70%
30.10%
0
60.20%
X1
0
71.60%
71.60%
34.10%
1
0
0
30.10%
5.70%
Net Profit
1.70%
1.70%
1.70%
100.00%

We can see that the arbitrage does indeed have a solution with the stake percentages (60.2%, 34.1%, 5.7%) giving an arbitrage of 1.7% for every possible outcome. There are many thousands of these arbitrage opportunities appearing each day in the sports markets ranging in size from 0.1% - 7%+.

What returns are possible? Consider, starting with a seed capital of £1k and a trading frequency of 3 times per week with an average arbitrage size of 1.6%. Initially we compound our winnings but there are limits to how much you can stake with a given bookmaker. Assume that we cannot increase our notional beyond £5000 across any multi-leg arbitrage trade. In that case, the initial £1k can grow to approximately £9,500 in one year. Not bad for a few minutes of effort per trade. 

So what's the catch? 

There are really only two pitfalls. 
1)  Scaling: You cannot easily compound your returns as with the financial markets.
2) Limit Risk: Bookmakers don't want you to win and can be inclined to significantly reduce your allowed stake notional if you win too much. Avoiding this requires careful management.

Although sports arbitrage does not easily scale, it is a great way of boosting trading capital by a few thousand pounds per year with very small time effort; capital which could be put to use in the financial or crypto markets.

===
About the author: Stephen Hope is Co-Founder of Machina Trading, a proprietary crypto & sports trading firm that provides an arbitrage tool called rational bet. He is former Head of Quantitative Trading Strategies at BNP Paribas and received his PhD in Physics from the University of Cambridge.

===
Upcoming Workshops by Dr. Ernie Chan

February 24 and March 3: Algorithmic Options Strategies

This online course focuses on backtesting intraday and portfolio option strategies. No pesky options pricing theories will be discussed, as the emphasis is on arbitrage trading.

June 4-8London workshops

These intense 8-16 hours workshops cover Algorithmic Options Strategies, Quantitative Momentum Strategies, and Intraday Trading and Market Microstructure. Typical class size is under 10. They may qualify for CFA Institute continuing education credits.