Universiteit Leiden

nl en

Statistical Learning and Prediction

Statistical learning encompasses an extensive collection of statistical techniques for making predictions (supervised context) or finding structure hidden in data (unsupervised context). These techniques are, for example, relevant for (1) predicting disease status and treatment outcome, (2) selecting personnel or students, (3) inferring groups of homogeneous subjects and (4) dimension reduction (PCA). In this course, we will discuss several techniques for making predictions (supervised context) and find hidden structure in data (unsupervised context).

Target group
Postdoctoral researcher
PhD candidate
Tom Wilderjans  (Associate professor) Marjolein Fokkema  (Associate professor)
Training course

Deadline registration is before Tuesday 9 January 2024

Statistical learning refers to a vast set of tools for understanding data. Two classes of such tools can be distinguished: “supervised” and “unsupervised”. Supervised statistical learning involves building a statistical model for predicting an output (response, dependent) variable based on one or more input (predictor) variables.

There are many areas of psychology where such a predictive question is of interest. For example, finding early markers for Alzheimer’s or other diseases, selection studies for personnel or education, or prediction of treatment outcomes. In unsupervised statistical learning, there are only input variables but no supervising output (dependent) variable; nevertheless, we can learn relationships and structures from such data using cluster analysis and methods for dimension reduction.

In this course we aim to give the student a firm theoretical basis for understanding and evaluating statistical learning techniques and teach the students skills to apply statistical learning techniques in empirical research. Techniques often referred to as ‘machine learning’ (though we prefer the name ‘statistical learning’ because the basic problems of sampling and probability have not changed with the newer algorithms) will be discussed, including cross-validation, elastic net regression, clustering (k-means, hierarchical), partial least squares, k-nearest neighbours, decision trees, gradient boosting, random forests, smoothing splines and support vector machines.

Mode of instructions

  • The course will be on campus
  • Interactive lectures (theory and illustrative examples in R) combined with practical sessions with exercises in R (students work on the exercises and solutions are discussed at the end)
  • Students can use (and should bring) their laptop with R (and Rstudio) installed
  • Students receive a list of online videos and associated parts and exercises of the book by James et. al, 2021 (see below) that should be prepared before the course starts. This will take about 20 – 25 hours of preparation, depending on TEMPLATE SUMMER- AND WINTER SCHOOL COURSES current statistical knowledge and R skills. Instructions follow at the beginning of January 2024 (for those enrolled in the course).

Reading list

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: with applications in R (Second Edition). New York: Springer. A free copy can be obtained online, https://www.statlearning.com/

Additional suggested material (not required)

  • Berk, R. A. (2008). Statistical learning from a regression perspective. New York, Springer. (freely accessible within the Leiden network)
  • Kuhn, M. & Johnson, K. (2013). Applied predictive modelling. New York, Springer. (freely accessible within the Leiden network)

Entry requirement

Open to doctoral students and staff members of the Faculty of Social and Behavioral Sciences, ASCL and ICLON. Also externals can participate in the course.

Some basic knowledge of R required (a list of materials that one can go through to reach the required R level can be obtained upon request from the coordinator).


Target group

One day

Two days

Three days

PhD candidates FSW/FGGA




Staff FSW




Other Leiden University PhD candidates








*Externals are PhD candidates related to staff members of FSW (buitenpromovendi) and/or staff members of other Leiden University Faculties.

This website uses cookies.  More information.