Oct 6, 2022

October 6, 2022

Predicting the future, By Just a Few Milliseconds

A curved line of falling dominoes

Recently, researchers from Princeton University released a working paper[1] through the National Bureau of Economic Research that studied the predictability of U.S. equities price returns at short timescales. Amongst other conclusions, they found that:

  1. At short time horizons, “large amounts of predictability exist universally in every stock (...) consistently over time.”
  2. Their results are robust across a variety of measures and formulations of the prediction problem.
  3. Predictability is highly time-sensitive: “predictability of returns and trading directions vanishes quickly (...) the majority of predictability lies in the most up-to-date few milliseconds.”

These results are academically interesting because they contrast with the same prediction problem considered at longer timescales, where market efficiency translates to “low signal-to-noise ratios, weak and persistent predictors, and instability of the predictive relations.”

At IEX Exchange, we are no strangers to the problem of predicting price changes at short timescales. Our single most popular order type, D-Peg, is powered by the IEX Signal, a proprietary statistical model which aims to do exactly this.

When the IEX Signal predicts[2] imminent adverse price changes, D-Peg orders are restricted from executing at more aggressive prices, in an attempt to avoid adverse selection.[3] The Signal has been predicting short-term price changes since 2014, and a lot of what we’ve learned over the years largely matches the methodology and findings of the Princeton researchers. The current IEX Signal model is our fifth version and was built using standard logistic regression. Like the authors, we’ve found that many common machine learning methods can be used to obtain similar predictive results. We’ve also seen the importance of features that the authors call out as specifically valuable: order book imbalances, transaction imbalances, and price movements in short trailing windows.

The “Why”

In their paper, the researchers emphasize that their study is focused on quantifying the level of predictability and does not aim to answer the question about “why” the predictability exists. Here, we might be able to add some additional color.

In our view, a large amount of the performance we’re able to achieve is due to the fact that price changes are not atomic events. In particular, price changes comprise clusters of activity spread across geographically distinct market venues.The information of these events takes time to propagate (restricted by at least the speed of information transfer: the speed of light), such that an unavoidable structural pattern is imposed upon the data that any single observer collects. Consider the following map of U.S. equity exchange data center locations, along with estimated microsecond values for latency due to the speed of light:


IEX’s matching engine is located in Weehawken, NJ. Meanwhile, the Cboe family of exchanges, as well as MEMX, MIAX, and LTSE are located in Secaucus, NJ; the Nasdaq family of exchanges is located 16 miles south in Carteret; and the NYSE family of exchanges is 21 miles north in Mahwah. What happens when the quotes of multiple exchanges change in close succession? The distinct geographic locations and corresponding latency differences between the various exchange families means that we at IEX will consistently observe the changes that occur at the Secaucus venues ahead of the changes from the venues at Carteret, and any action from the venues at Mahwah will be seen last.

This ordering generates predictable patterns from our perspective, and the persistence of these patterns is not surprising – it is a corollary of the laws of physics. As much as we improve processing technology, we simply cannot get information from the other market venues faster than the speed of light!

The IEX Signal

You can see this geographical structure emerge in our model itself: Rule 11.190(g). The current IEX Signal has a heavy dependence on a feature we call “Delta” in our Rule Book. Essentially, the Delta feature assigns a significant weight to quote changes we see at the geographically closest large maker-taker venues: BATS, EDGX, and Nasdaq. This is exactly to take advantage of the aforementioned structural pattern induced by our specific geographical location. This is also why it wouldn’t make sense to copy our order type and associated model to use at a venue in a location with an entirely different latency profile.

In addition to our modeling, we gain even more predictive power at IEX by forcing all incoming orders to pass through a 350 microsecond “speed bump.” This affords us extra time relative to exchange participants that we can use to collect the most up-to-date market data and protect resting orders on IEX. In fact, 71% of all aggressive orders[4] that arrive within 2ms of a Signal prediction do so in the first 350 microseconds after the prediction[5] – that is: without our 350-microsecond head start, resting orders on IEX would lose the majority of the protection they are afforded here.

A corollary is that other venues attempting to incorporate an IEX-like “signal” would not be nearly as effective due to the relative speed disadvantages those venues have vs. their fastest participants. The benefit we’ve experienced from our delay aligns with the Princeton researchers’ findings of the “extreme value of the timeliness of the data on a scale of a few milliseconds,” and the speed-bump-based protection we’re able to provide is unique to IEX amongst U.S. equities exchanges.

Building Exchange Products with Predictive Analytics

The Princeton researchers’ conclusion that the prices of virtually all stocks have large amounts of predictability at ultra-short time horizons brings the natural questions of: who is advantaged/disadvantaged by this aspect of our market structure, and can we structure venues to affect this predictability? The reality is that advantageous trading within these predictable price changes is dominated by proprietary trading firms that have invested in the speed technology required to compete at these timescales.

Source: H1 2022 IEX market data.

Notionally weighted average trade-to-mid markouts for non-Signal-enabled IEX order types. Note that markouts are computed from the remover’s perspective, and a positive markout for the remover corresponds to a negative markout for its resting counterparty (adverse selection).

Source: H1 2022 IEX market data/IEX member classifications.

Percent volume removed by IEX classification of counterparty when Signal is Off/On.

At IEX, we’ve built products that have helped democratize these technologies and deemphasize the speed race at these microscopic levels. D-Peg, our Signal-enabled midpoint order type, has driven our growth to be the exchange with the most stable[6] midpoint volume:

Source: NYSE, TAQ.

Most recently, we’ve released a groundbreaking new displayed limit order type built upon the Signal: D-Limit. These products are available to all market participants, not just those with speed advantage.

As the market continues to evolve, IEX will continue to innovate to build a market that works for all market participants. But to do so, we – and other innovators in the space – need transparency and insight into the dynamics that are emerging and developing. Research, like what we see here from the Princeton academics, is a critical part of the puzzle. By bringing empirical data and rigor to their work, they are able to open up discussion of dynamics that need to be brought into the light. We hope to see much more from them and their colleagues in the future.

[1] How and When are High-Frequency Stock Returns Predictable?, Yacine Aït-Sahalia, Jianqing Fan, Lirong Xue, and Yifeng Zhou, NBER Working Paper No. 30366, August 2022, JEL No. C45, C53, C58, G12, G14, G17.

[2] As a probabilistic model, the IEX Signal can make both "correct" and "incorrect" predictions. In March 2022, 78% of Signal predictions correctly predicted the direction of the next change to the National Bid and Best Offer (NBBO) in the volume-weighted average symbol.

[3] Adverse selection refers to executions that take place right before the best available market prices change, indicating an information asymmetry between a resting order and its counterparty.

[4] Aggressive orders are defined as those with limit prices at or past the far side of the National Best Bid/Offer.

[5] Source: H1 2022 IEX market data.

[6] Stable volume here is defined as executions that do not see the National Best Bid/Offer move in the following 2 milliseconds. In other words, they were not adversely selected.