A SHORT COURSE IN DATA MINING WITH APPLICATIONS TO PUBLIC POLICY
Institute for Capacity Development – International Monetary Fund
July 6, 2018
Program
Presentations
- Data mining: the scientific and industrial revolution.
- Foundations of Machine Learning: models, concepts, fundamental results and prediction and causality.
- Basic techniques: k-NN, linear models, regularization, trees, random forests, etc.
- Application: Public health.
- Special techniques: Cross validation, bagging, bootstrapping, ensembles.
- Text mining: terms vectorization, LDA, Markov Chain methods. Application: Effects FOMC Communications and Economic Policy Uncertainty.
- Unsupervised learning: Clustering, K-means, associative rules, kernel density estimation.
- Application: Crime prediction. Application: Fraud detection
- Advanced topics: Neural networks, deep learning. Application: Forecasting inflation, unemployment and poverty characterization
Readings
The first set of references is really the minimum, from the perspective of this course, to get a good and founded idea of what data mining, big data and/or machine lealvaarning is about. They are mostly non technical
The absolute minimum
Prediction Policy Problems. Jon Kleinberg. Jens Ludwig. Sendhil Mullainathan. Ziad Obermeyer. American Economic Review. Vol. 105, NO. 5, May 2015. (pp. 491-95).
McKinsey Global Institute: The age of analytics executive summary
Big Data: New Tricks for Econometrics. Hal R. Varian. Journal of Economic Perspectives—Volume 28, Number 2—Spring 2014—Pages 3–28.
Statistical Modeling: The Two Cultures Leo Breiman. Statistical Science, Vol. 16, No. 3. (Aug., 2001), pp. 199-215.
Economics in the age of big data. Liran Einav and Jonathan Levin. Science 346 , (2014).
Presentations references
Theory
[B]: Bishop. Pattern Recognition and Machine Learning. Springer.
[LS]: Luxburg, U., B. Scholkopf. 2008. Statistical Learning Theory: Models, Concepts and Results.
http://arxiv.org/abs/0810.4752
[JWHT]: Introduction to Statistical Learning with Applications in R.
http://www-bcf.usc.edu/~gareth/ISL/
[HTF]: Hastie, T., Tibshirani, R. y J. Hastie. 2009. The Elements of Statistical Learning: Data Minning, Inference and Prediction. Segunda Edición. Springer
Array
Applications
Andrés Azqueta-Gavaldón. Developing news-based Economic Policy Uncertainty index with
unsupervised machine learning. Economics Letters 158 (2017) 47–50.
Bayesian Variable Selection for Nowcasting Economic Time Series. Steven L. Scott, Hal R. Varian. http://www.nber.org/chapters/c12995
Predicting the Present with Bayesian Structural Time Series. Steven L. Scott. Hal Varian. 2013.
Predicting the Present with Google Trends. Hyunyoung Choi, Hal Varian. December 18, 2011.
Predicting Initial Claims for Unemployment Benefits. Hyunyoung Choi, Hal Varian. July 5, 2009.
The Billion Prices Project: Using Online Prices for Measurement and Research. Alberto Cavallo and Roberto Rigobon. Journal of Economic Perspectives—Volume 30, Number 2—Spring 2016—Pages 151–178.
Combining satellite imagery and machine learning to predict poverty. Neal Jean, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, Stefano Ermon. Science 2016 . Vol 353 Issue 6301.
The Effects of the Content of FOMC Communications on US Treasury Rates. Christopher Rohlfs, Sunandan Chakraborty, Lakshminarayanan Subramanian.
Machine Learning: An Applied Econometric Approach Sendhil Mullainathan and Jann Spiess. Journal of Economic Perspectives—Volume 31, Number 2—Spring 2017—Pages 87–106.
Machine Learning Methods for Demand Estimation. Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang. American Economic Review: Papers & Proceedings 2015, 105(5): 481–485.