Machine Learning and Algorithmics Seminar 2020
Proposed seminar topics
Session 1: Introduction
- Participants
- Overview and practicalities
- Assigning the topics
Session 2: Overview
- Introduction lecture: contemporary themes in machine learning
- Discussion
- Finalizing the schedule
Session 3: Position papers
Replication, Communication, and the Population Dynamics of Scientific Discovery. McElreath, R., & Smaldino, P. E. (2015).
Machine behaviour. Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J., & Breazeal, C., et al. (2019).
Beyond subjective and objective in statistics (with discussion and rejoinder). Journal of the Royal Statistical Society A 180, 967–1033. Andrew Gelman and Christian Hennig (2017).
Session 4: Hypothesis testing and significance
Abandon statistical significance. American Statistician 73(S1):235–245. Blakeley B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett, 2019.
The ASA Statement on p-Values: Context, Process, and Purpose. Ronald L. Wasserstein & Nicole A. Lazar. Pages 129-133 https://doi.org/10.1080/00031305.2016.1154108
Bayesian statistics Jorge López Puga, Martin Krzywinski & Naomi Altman Nature Methods 12:377–378, 2015.
Jaynes, E. T., 1976. `Confidence Intervals vs Bayesian Intervals,' in Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, W. L. Harper and C. A. Hooker (eds.), D. Reidel, Dordrecht, p. 175; https://bayes.wustl.edu/etj/articles/confidence.pdf
Session 5: Prior information
The prior can often only be understood in the context of the likelihood. Entropy 19:555, 2017. Andrew Gelman, Daniel Simpson, and Michael Betancourt.
The experiment is just as important as the likelihood in understanding the prior: A cautionary note on robust cognitive modelling. Computational Brain and Behavior. Lauren Kennedy, Daniel Simpson, and Andrew Gelman, 2019.
Sparsity information and regularization in the horseshoe and other shrinkage priors. In Electronic Journal of Statistics, 11(2):5018-5051. Juho Piironen and Aki Vehtari (2017). arXiv:1707.01694. After the article was published, the regularized horseshoe prior has been implemented in rstanarm and brms (without conditioning on sigma).
Session 6: Visualization
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Leland McInnes, John Healy, James Melville (2018)
Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622. Tipping, M. E., & Bishop, C. M. (1999).
Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations. BMC Bioinformatics, August, 2017. Lan Huong Nguyen and Susan Holmes (2017)
Session 7: Feature selection
XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. Tianqi Chen and Carlos Guestrin. https://arxiv.org/abs/1603.02754
Projective inference in high-dimensional problems: prediction and feature selection. Juho Piironen, Markus Paasiniemi, and Aki Vehtari (2018). arXiv:1810.02406.
Bayesian inference for spatio-temporal spike and slab priors. In Journal of Machine Learning Research, 18(139):1-58. Michael Riis Andersen, Aki Vehtari, Ole Winther and Lars Kai Hansen (2017). arXiv:1509.04752.
Session 8: Model selection
Bayesian model selection for complex dynamic systems Christoph Mark, Claus Metzner, Lena Lautscham, Pamela L. Strissel, Reiner Strick, Ben Fabry. Nature Communications volume 9(1803), 2018
Aki Vehtari, Andrew Gelman and Jonah Gabry (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. In Statistics and Computing, 27(5):1413–1432. doi:10.1007/s11222-016-9696-4. arXiv:1507.04544.
Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y. Juho Piironen and Aki Vehtari (2017). arXiv:1503.08650.
Session 9: Scalable inference
Automatic variational inference in Stan. Neural Information Processing Systems. Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David Blei. 2015
Fundamentals and recent developments in approximate Bayesian computation. Systematic Biology. 66(1),e66-e82 .doi: 10.1093/sysbio/syw077 Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel, Corander, Jukka (2017).
Bayesian Computing with INLA: A Review. Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, Finn K. Lindgren. https://arxiv.org/abs/1604.00860
Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. Journal of Machine Learning Research. 21, 1–53. Aki Vehtari, Andrew Gelman, Tuomas Sivula, Pasi Jylanki, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, and Christian P. Robert, 2020
A Conceptual Introduction to Hamiltonian Monte Carlo. Michael Betancourt. arXiv:1701.02434
Session 10: Time series and survival analysis
LonGP: an additive Gaussian process regression model for longitudinal study designs. Nature Communications, 10:1798. Lu Cheng, Siddharth Ramchandran, Tommi Vatanen, Niina Lietzen, Riitta Lahesmaa, Aki Vehtari, and Harri Lähdesmäki (2019).
An interpretable probabilistic machine learning method for heterogeneous longitudinal studies. Juho Timonen, Henrik Mannerström, Aki Vehtari, Harri Lähdesmäki (2019). arXiv:1912.03549. .
Bayesian Survival Analysis Using the rstanarm R Package. Samuel L. Brilleman, Eren M. Elci, Jacqueline Buros Novik, Rory Wolfe
Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective. Filip Tronarp, Hans Kersting, Simo Särkkä, Philipp Hennig (2019). Statistics and Computing.
A hierarchical Ornstein-Uhlenbeck model for stochastic time series analysis. Ville Laitinen and Leo Lahti. Advances in Intelligent Data Analysis XVII. Lecture Notes in Computer Science 11191., Springer, India, 2018. Conference proceedings. https://openresearchlabs.github.io/publications/papers/2018-Laitinen-IDA.pdf
Session 11: Gaussian processes
Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data. J Vanhatalo, M Hartmann, L Veneranta. Bayesian Analysis
Boettiger C, Mangel M, Munch S. Avoiding tipping points in fisheries management through Gaussian process dynamic programming. Proc Biol Sci. 2015;282(1801):20141631. doi:10.1098/rspb.2014.1631
Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Marko Järvenpää, Michael Gutmann, Aki Vehtari and Pekka Marttinen (2018). The Annals of Applied Statistics, 12(4):2228-2251. arXiv:1610.06462.
Deep Gaussian Processes. Andreas C. Damianou, Neil D. Lawrence. https://arxiv.org/abs/1211.0358
Session 12: Bayesian workflow
Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, 182(2):389-402. Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman (2019). arXiv:1709.01449.
Toward a principled Bayesian workflow in cognitive science. Daniel J. Schad, Michael Betancourt, Shravan Vasishth. https://arxiv.org/abs/1904.12765
Validating Bayesian Inference Algorithms with Simulation-Based Calibration. Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, Andrew Gelman. https://arxiv.org/pdf/1804.06788.pdf
Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Michael Gutmann, Aki Vehtari, Jukka Corander, and Samuel Kaski (2018). ELFI: Engine for Likelihood Free Inference. In Journal of Machine Learning Research, 19(16):1-7, 2018. arXiv:1708.00707
Session 13: Deep learning, neural nets, and autoencoders
Learning representations of microbe–metabolite interactions. JT Morton, AA Aksenov, LF Nothias, JR Foulds, RA Quinn, MH Badri, Nature methods 16 (12), 1306-1314
An Introduction to Variational Autoencoders. Diederik P. Kingma, Max Welling. https://arxiv.org/abs/1906.02691
Bayesian GAN. Yunus Saatchi, Andrew Gordon Wilson. https://arxiv.org/abs/1705.09558
Deep Learning: A Bayesian Perspective. Nicholas Polson, Vadim Sokolov. https://arxiv.org/abs/1706.00473
Towards Bayesian Deep Learning: A Survey. Hao Wang, Dit-Yan Yeung. https://arxiv.org/abs/1604.01662
Session 14: Applications
Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment. Julia Fukuyama, Laurie Rumker, Kris Sankaran, Pratheepa Jeganathan, Les Dethlefsen, David A. Relman, Susan P. Holmes (2017) PLOS Computational Biology, August 2017.
Fast hierarchical Bayesian analysis of population structure. Tonkin-Hill, Gerry; Lees, John A; Bentley, Stephen D; Frost, Simon DW & Corander, Jukka (2019). Nucleic Acids Research. 47(11),5539-5549. doi: 10.1093/nar/gkz361
Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data. Jarno Vanhatalo†, Marcelo Hartmann‡and Lari Veneranta. Bayesian analysis 2019. https://arxiv.org/pdf/1809.02432.pdf
Session 15: Wrap-up
Other sources
You can also consider topics from the following books:
- Statistical rethinking (Richard McElreath; CRC Press)
- Bayesian Data Analysis (Gelman et al. Chapman & Hall / CRC)
- Pattern recognition and Machine Learning (Bishop; Springer)
- The Elements of Statistical Learning (Hastie et al; Springer)
- Networks, Crowds, and Markets (Easley & Kleinberg; Cambridge University Press)