Resources
This page contains a number of machine learning, big data and asset pricing resources I have found helpful. I have compiled this evolving set of resources in the hope it will also help others. You will find resources for both the TensorFlow and Pytorch implementations of neural networks in Python, although I have started leaning more and more towards TensorFlow. I have also gathered a consolidated (but in no way comprehensive) list of anomaly factors used in the published asset pricing literature in an effort to make utilizing these factors in new research easier.
- Pytorch
Tutorials from the developers
General documentation from the developers
PyTorch implementation of the Dive into Deep Learning online book
Example code annotated at each line so you can see what the code is doing
- TensorFlow
Implemented models - a collection of completely implemented models using TensorFlow
General documentation from the developers
- Data Resources
- Sentiment Analysis
- Miscellaneous
Stanford's computer science department has various open source resources
Heebum Lee Github repository where he has compiled his own list of useful machine learning resources
Apache Spark resources for distributive computing
Maximilian Kasy website where he has compiled his own list of useful machine learning and computational resources
- Machine Learning Theory
- Textbooks
Machine Learning - A Probabilistic Perspective: Kevin Murphy (2012)
Pattern Recognition and Machine Learning: Christopher M. Bishop (2006)
Elements of Statistical Learning: Hastie, Tibshiranni and Friedman (2017)
- Anomaly Factors
Fama-French 3- and 5-factors: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
q5 factors of Hou, Mo, Xue and Zhang (2021): http://global-q.org/factors.html
Big data factors of Pelger (and associated code): https://mpelger.people.stanford.edu/data-and-code
Shrinking the Cross-Section factors (Kozak, Nagel and Santosh, 2020): https://www.serhiykozak.com/data
Empirical Asset Pricing via Machine Learning (Gu, Kelly and Xiu, 2020) factors - raw data for generating the factors: https://dachxiu.chicagobooth.edu/
Amit Goyal macro factors: http://www.hec.unil.ch/agoyal/
205 factors recreated in Chen and Zimmerman (2021): https://www.openassetpricing.com/data/
153 factors from 93 countries from Is There a Replication Crisis in Finance (Jensen, Kelly and Pedersen) - code for recreating the factors can be found in their Github repository: https://www.bryankellyacademic.org/
Downside risk bond factor (Bai, Bali and Wen, 2019): https://sites.google.com/a/georgetown.edu/turan-bali/data-working-papers?authuser=0
Option implied factors and associated Python code: https://www.vilkov.net/codedata.html
AQR Capital factors: https://www.aqr.com/Insights/Datasets
McCracken and Ng (2015) macro factors: https://research.stlouisfed.org/econ/mccracken/fred-databases/