Machine Learning and Algorithmics Seminar 2020

Proposed seminar topics

Session 1: Introduction

Participants
Overview and practicalities
Assigning the topics

Session 2: Overview

Introduction lecture: contemporary themes in machine learning
Discussion
Finalizing the schedule

Session 3: Position papers

Replication, Communication, and the Population Dynamics of Scientific Discovery. McElreath, R., & Smaldino, P. E. (2015).

Machine behaviour. Rahwan, I., Cebrian, M., Obradovich, N., Bongard, J., Bonnefon, J., & Breazeal, C., et al. (2019).

Beyond subjective and objective in statistics (with discussion and rejoinder). Journal of the Royal Statistical Society A 180, 967–1033. Andrew Gelman and Christian Hennig (2017).

Session 4: Hypothesis testing and significance

Abandon statistical significance. American Statistician 73(S1):235–245. Blakeley B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett, 2019.

The ASA Statement on p-Values: Context, Process, and Purpose. Ronald L. Wasserstein & Nicole A. Lazar. Pages 129-133 https://doi.org/10.1080/00031305.2016.1154108

Bayesian statistics Jorge López Puga, Martin Krzywinski & Naomi Altman Nature Methods 12:377–378, 2015.

Jaynes, E. T., 1976. `Confidence Intervals vs Bayesian Intervals,' in Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, W. L. Harper and C. A. Hooker (eds.), D. Reidel, Dordrecht, p. 175; https://bayes.wustl.edu/etj/articles/confidence.pdf

Session 5: Prior information

The prior can often only be understood in the context of the likelihood. Entropy 19:555, 2017. Andrew Gelman, Daniel Simpson, and Michael Betancourt.

The experiment is just as important as the likelihood in understanding the prior: A cautionary note on robust cognitive modelling. Computational Brain and Behavior. Lauren Kennedy, Daniel Simpson, and Andrew Gelman, 2019.

Sparsity information and regularization in the horseshoe and other shrinkage priors. In Electronic Journal of Statistics, 11(2):5018-5051. Juho Piironen and Aki Vehtari (2017). arXiv:1707.01694. After the article was published, the regularized horseshoe prior has been implemented in rstanarm and brms (without conditioning on sigma).

Session 6: Visualization

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Leland McInnes, John Healy, James Melville (2018)

Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622. Tipping, M. E., & Bishop, C. M. (1999).

Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations. BMC Bioinformatics, August, 2017. Lan Huong Nguyen and Susan Holmes (2017)

Session 7: Feature selection

XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016. Tianqi Chen and Carlos Guestrin. https://arxiv.org/abs/1603.02754

Projective inference in high-dimensional problems: prediction and feature selection. Juho Piironen, Markus Paasiniemi, and Aki Vehtari (2018). arXiv:1810.02406.

Bayesian inference for spatio-temporal spike and slab priors. In Journal of Machine Learning Research, 18(139):1-58. Michael Riis Andersen, Aki Vehtari, Ole Winther and Lars Kai Hansen (2017). arXiv:1509.04752.

Session 8: Model selection

Bayesian model selection for complex dynamic systems Christoph Mark, Claus Metzner, Lena Lautscham, Pamela L. Strissel, Reiner Strick, Ben Fabry. Nature Communications volume 9(1803), 2018

Aki Vehtari, Andrew Gelman and Jonah Gabry (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. In Statistics and Computing, 27(5):1413–1432. doi:10.1007/s11222-016-9696-4. arXiv:1507.04544.

Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y. Juho Piironen and Aki Vehtari (2017). arXiv:1503.08650.

Session 9: Scalable inference

Automatic variational inference in Stan. Neural Information Processing Systems. Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, and David Blei. 2015

Fundamentals and recent developments in approximate Bayesian computation. Systematic Biology. 66(1),e66-e82 .doi: 10.1093/sysbio/syw077 Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel, Corander, Jukka (2017).

Bayesian Computing with INLA: A Review. Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, Finn K. Lindgren. https://arxiv.org/abs/1604.00860

Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. Journal of Machine Learning Research. 21, 1–53. Aki Vehtari, Andrew Gelman, Tuomas Sivula, Pasi Jylanki, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, and Christian P. Robert, 2020

A Conceptual Introduction to Hamiltonian Monte Carlo. Michael Betancourt. arXiv:1701.02434

Session 10: Time series and survival analysis

LonGP: an additive Gaussian process regression model for longitudinal study designs. Nature Communications, 10:1798. Lu Cheng, Siddharth Ramchandran, Tommi Vatanen, Niina Lietzen, Riitta Lahesmaa, Aki Vehtari, and Harri Lähdesmäki (2019).

An interpretable probabilistic machine learning method for heterogeneous longitudinal studies. Juho Timonen, Henrik Mannerström, Aki Vehtari, Harri Lähdesmäki (2019). arXiv:1912.03549. .

Bayesian Survival Analysis Using the rstanarm R Package. Samuel L. Brilleman, Eren M. Elci, Jacqueline Buros Novik, Rory Wolfe

Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective. Filip Tronarp, Hans Kersting, Simo Särkkä, Philipp Hennig (2019). Statistics and Computing.

A hierarchical Ornstein-Uhlenbeck model for stochastic time series analysis. Ville Laitinen and Leo Lahti. Advances in Intelligent Data Analysis XVII. Lecture Notes in Computer Science 11191., Springer, India, 2018. Conference proceedings. https://openresearchlabs.github.io/publications/papers/2018-Laitinen-IDA.pdf

Session 11: Gaussian processes

Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data. J Vanhatalo, M Hartmann, L Veneranta. Bayesian Analysis

Boettiger C, Mangel M, Munch S. Avoiding tipping points in fisheries management through Gaussian process dynamic programming. Proc Biol Sci. 2015;282(1801):20141631. doi:10.1098/rspb.2014.1631

Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Marko Järvenpää, Michael Gutmann, Aki Vehtari and Pekka Marttinen (2018). The Annals of Applied Statistics, 12(4):2228-2251. arXiv:1610.06462.

Deep Gaussian Processes. Andreas C. Damianou, Neil D. Lawrence. https://arxiv.org/abs/1211.0358

Session 12: Bayesian workflow

Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, 182(2):389-402. Jonah Gabry, Daniel Simpson, Aki Vehtari, Michael Betancourt, and Andrew Gelman (2019). arXiv:1709.01449.

Toward a principled Bayesian workflow in cognitive science. Daniel J. Schad, Michael Betancourt, Shravan Vasishth. https://arxiv.org/abs/1904.12765

Validating Bayesian Inference Algorithms with Simulation-Based Calibration. Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, Andrew Gelman. https://arxiv.org/pdf/1804.06788.pdf

Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Michael Gutmann, Aki Vehtari, Jukka Corander, and Samuel Kaski (2018). ELFI: Engine for Likelihood Free Inference. In Journal of Machine Learning Research, 19(16):1-7, 2018. arXiv:1708.00707

Session 13: Deep learning, neural nets, and autoencoders

Learning representations of microbe–metabolite interactions. JT Morton, AA Aksenov, LF Nothias, JR Foulds, RA Quinn, MH Badri, Nature methods 16 (12), 1306-1314

An Introduction to Variational Autoencoders. Diederik P. Kingma, Max Welling. https://arxiv.org/abs/1906.02691

Bayesian GAN. Yunus Saatchi, Andrew Gordon Wilson. https://arxiv.org/abs/1705.09558

Deep Learning: A Bayesian Perspective. Nicholas Polson, Vadim Sokolov. https://arxiv.org/abs/1706.00473

Towards Bayesian Deep Learning: A Survey. Hao Wang, Dit-Yan Yeung. https://arxiv.org/abs/1604.01662

Session 14: Applications

Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment. Julia Fukuyama, Laurie Rumker, Kris Sankaran, Pratheepa Jeganathan, Les Dethlefsen, David A. Relman, Susan P. Holmes (2017) PLOS Computational Biology, August 2017.

Fast hierarchical Bayesian analysis of population structure. Tonkin-Hill, Gerry; Lees, John A; Bentley, Stephen D; Frost, Simon DW & Corander, Jukka (2019). Nucleic Acids Research. 47(11),5539-5549. doi: 10.1093/nar/gkz361

Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data. Jarno Vanhatalo†, Marcelo Hartmann‡and Lari Veneranta. Bayesian analysis 2019. https://arxiv.org/pdf/1809.02432.pdf

Session 15: Wrap-up

Other sources

You can also consider topics from the following books:

Statistical rethinking (Richard McElreath; CRC Press)
Bayesian Data Analysis (Gelman et al. Chapman & Hall / CRC)
Pattern recognition and Machine Learning (Bishop; Springer)
The Elements of Statistical Learning (Hastie et al; Springer)
Networks, Crowds, and Markets (Easley & Kleinberg; Cambridge University Press)

TKO_3121: Topics