Financial Feature Engineering: How to research Alpha Factors

Algorithmic trading strategies are driven by signals that indicate when to buy or sell assets to generate superior returns relative to a benchmark such as an index. The portion of an asset's return that is not explained by exposure to this benchmark is called alpha, and hence the signals that aim to produce such uncorrelated returns are also called alpha factors.

If you are already familiar with ML, you may know that feature engineering is a key ingredient for successful predictions. This is no different in trading. Investment, however, is particularly rich in decades of research into how markets work and which features may work better than others to explain or predict price movements as a result. This chapter provides an overview as a starting point for your own search for alpha factors.

This chapter also presents key tools that facilitate the computing and testing alpha factors. We will highlight how the NumPy, pandas and TA-Lib libraries facilitate the manipulation of data and present popular smoothing techniques like the wavelets and the Kalman filter that help reduce noise in data.

We also preview how you can use the trading simulator Zipline to evaluate the predictive performance of (traditional) alpha factors. We discuss key alpha factor metrics like the information coefficient and factor turnover. An in-depth introduction to backtesting trading strategies that use machine learning follows in Chapter 6, which covers the ML4T workflow that we will use throughout the book to evaluate trading strategies.

Please see the Appendix - Alpha Factor Library for additional material on this topic, including numerous code examples that compute a broad range of alpha factors.

Alpha Factors in practice: from data to signals

Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.

There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitraged away.

Building on Decades of Factor Research

In an idealized world, categories of risk factors should be independent of each other (orthogonal), yield positive risk premia, and form a complete set that spans all dimensions of risk and explains the systematic risks for assets in a given class. In practice, these requirements will hold only approximately.

References

Dissecting Anomalies by Eugene Fama and Ken French (2008)
Explaining Stock Returns: A Literature Review by James L. Davis (2001)
Market Efficiency, Long-Term Returns, and Behavioral Finance by Eugene Fama (1997)
The Efficient Market Hypothesis and It's Critics by Burton Malkiel (2003)
The New Palgrave Dictionary of Economics (2008) by Steven Durlauf and Lawrence Blume, 2nd ed.
Anomalies and Market Efficiency by G. William Schwert25 (Ch. 15 in Handbook of the- Economics of Finance, by Constantinides, Harris, and Stulz, 2003)
Investor Psychology and Asset Pricing, by David Hirshleifer (2001)

Engineering alpha factors that predict returns

Based on a conceptual understanding of key factor categories, their rationale and popular metrics, a key task is to identify new factors that may better capture the risks embodied by the return drivers laid out previously, or to find new ones. In either case, it will be important to compare the performance of innovative factors to that of known factors to identify incremental signal gains.

Code Example: How to engineer factors using pandas and NumPy

The notebook feature_engineering.ipynb in the data directory illustrates how to engineer basic factors.

Code Example: How to use TA-Lib to create technical alpha factors

The notebook how_to_use_talib illustrates the usage of TA-Lib, which includes a broad range of common technical indicators. These indicators have in common that they only use market data, i.e., price and volume information.

The notebook common_alpha_factors in th appendix contains dozens of additional examples.

Code Example: How to denoise your Alpha Factors with the Kalman Filter

The notebook kalman_filter_and_wavelets demonstrates the use of the Kalman filter using the PyKalman package for smoothing; we will also use it in Chapter 9 when we develop a pairs trading strategy.

Code Example: How to preprocess your noisy signals using Wavelets

The notebook kalman_filter_and_wavelets also demonstrates how to work with wavelets using the PyWavelets package.

Resources

Fama French Data Library
numpy website
Quickstart Tutorial
pandas website
User Guide
10 minutes to pandas
Python Pandas Tutorial: A Complete Introduction for Beginners
alphatools - Quantitative finance research tools in Python
mlfinlab - Package based on the work of Dr Marcos Lopez de Prado regarding his research with respect to Advances in Financial Machine Learning
PyKalman documentation
Tutorial: The Kalman Filter
Understanding and Applying Kalman Filtering
How a Kalman filter works, in pictures
PyWavelets - Wavelet Transforms in Python
An Introduction to Wavelets
The Wavelet Tutorial
Wavelets for Kids
The Barra Equity Risk Model Handbook
Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk by Richard Grinold and Ronald Kahn, 1999
Modern Investment Management: An Equilibrium Approach by Bob Litterman, 2003
Quantitative Equity Portfolio Management: Modern Techniques and Applications by Edward Qian, Ronald Hua, and Eric Sorensen
Spearman Rank Correlation

From signals to trades: backtesting with `Zipline`

The open source zipline library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund Quantopian to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.

Chapter 8 contains a more comprehensive introduction to Zipline.
Please follow the instructions in the installation folder.

Code Example: How to use Zipline to backtest a single-factor strategy

The notebook single_factor_zipline develops and test a simple mean-reversion factor that measures how much recent performance has deviated from the historical average. Short-term reversal is a common strategy that takes advantage of the weakly predictive pattern that stock price increases are likely to mean-revert back down over horizons from less than a minute to one month.

Code Example: Combining factors from diverse data sources on the Quantopian platform

The Quantopian research environment is tailored to the rapid testing of predictive alpha factors. The process is very similar because it builds on zipline, but offers much richer access to data sources.

The notebook multiple_factors_quantopian_research illustrates how to compute alpha factors not only from market data as previously but also from fundamental and alternative data.

Code Example: Separating signal and noise – how to use alphalens

The notebook performance_eval_alphalens introduces the alphalens library for the performance analysis of predictive (alpha) factors, open-sourced by Quantopian. It demonstrates how it integrates with the backtesting library zipline and the portfolio performance and risk analysis library pyfolio that we will explore in the next chapter.

alphalens facilitates the analysis of the predictive power of alpha factors concerning the: - Correlation of the signals with subsequent returns - Profitability of an equal or factor-weighted portfolio based on a (subset of) the signals - Turnover of factors to indicate the potential trading costs - Factor-performance during specific events - Breakdowns of the preceding by sector

The analysis can be conducted using tearsheets or individual computations and plots. The tearsheets are illustrated in the online repo to save some space.

See here for a detailed alphalens tutorial by Quantopian

Alternative Algorithmic Trading Libraries and Platforms

QuantConnect
Alpha Trading Labs
Alpha Trading Labs is no longer active
WorldQuant
Python Algorithmic Trading Library PyAlgoTrade
pybacktest
Trading with Python
Interactive Brokers