Teaching‎ > ‎

Machine Learning and Data Analytics 2018-2019

See also the official syllabus.

Goals

Knowledge and understanding:
  • Know main kinds of problems which can be tackled with ML, DM, and EC and those ones concerning text and natural language and recommendation
  • Know main ML and DM techniques; know the high-level working scheme of EAs.
  • Know design, development, and assessment phases of a ML system; know main assessment metrics and procedures suitable for a ML system.
Applying knowledge and understanding:
  • Formulate a formal problem statement for simple practical problems in order to tackle them with ML, DM, or EC techniques.
  • Develop simple end-to-end ML or DM systems.
  • Experimentally assess a simple end-to-end ML or DM system.
Making judgements:
  • Judge the technical soundness of a ML or DM system.
  • Judge the technical soundness of the assessment of a ML or DM system.
Communication skills:
  • Describe, both in written and oral form, the motivations behind choices in the design, development, and assessment of a ML or DM system, possibly exploiting simple plots.
Learning skills:
  • Retrieve information from scientific publications about ML, DM or EC techniques not explicitly presented in this course.

Requirements

Basics of statistics: basic graphical tools of data exploration; summary measures of variable distribution (mean, variance, quantiles); fundamentals of probability and of univariate and multivariate distribution of random variables; basics of linear regression analysis.
Basics of linear algebra: vectors, matrices, matrix operations; diagonalization and decomposition in singular values.
Basics of programming and data structures: algorithm, data types, loops, recursion, parallel execution, tree.

Detailed program

First chunk (3 CFU, by prof. Matilde Trevisani)

(This chunk is part of the 12CFU version of the cource (mainly DSSC), not of the 9CFU version)
  • Introduction to data science; data analytics, machine learning and statistical learning approaches: common and distinctive aspects (more and more different in name only).
  • Recap. of main concepts and tools of probability and statistical inference.
  • Elements of statistical learning; regression function; assessing model accuracy and the bias-variance trade-off; cross-validation methods.
  • Supervised learning and linear models; model validation and selection; hints to regularization and extensions.

Second chunk (3 CFU, by prof. Eric Medvet)

  • Definitions of Machine Learning and Data Mining; why ML and DM are hot topics; examples of applications of ML; phases of design, development, and assessment of a ML system; terminology.
  • Elements of data visualization.
  • Supervised learning.
    • Tree-based methods.
      • Decision and regression trees: learning and prediction; role of the parameter and overfitting.
      • Trees aggregation: bagging, Random Forest, boosting.
      • Supervised learning system assessment: cross-fold validation; accuracy and other metrics; metrics for binary classification (FPR, FNR, EER, AUC) and ROC.
    • Support Vector Machines (SVM).
      • Separating hyperplane: maximal margin classifier; support vectors; learning as an optimization problem; maximal margin classifier limitations.
      • Soft margin classifier: learning, role of the parameter C.
      • Non linearly separable problems; kernel: brief background and main options (linear, polynomial, radial); intuition behind radial kernel; SVM,
      • Multiclass classification with SVM.

Third chunk (3 CFU, by prof. Matilde Trevisani)

  • Supervised learning for classification.
    • Training and test error rate; the Bayes classifier.
    • Logistic regression.
    • Linear and quadratic discriminant analysis.
    • The K-nearest neighbors classifier.
  • Unupervised learning.
    • Dimensionality reduction methods: principal component analysis; biplot.
    • Cluster analysis: hierarchical methods, partitional methods (k-means algorithm).

Fourth chunk (3 CFU, by prof. Eric Medvet)

  • Text mining
    • Sentiment analysis
    • Features for text
    • Topic modeling
  • Recommender systems
    • Content-based filtering
    • Collaborative filtering
  • Evolutionary computation

Exam

Either:

Lessons timetable and course calendar

The course will start on October, 8th for the 12CFU version (DSSC) and on November, 6th for the 9CFU version.
Lessons will be held in Classroom 3B, H2bis building, in Piazzale Europa campus.

Lezioni

Suggested textbooks

  • Kenneth A. De Jong. Evolutionary computation: a unified approach. MIT press, 2006
  • Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning: Data Mining, Inference, and Prediction. Springer, Berlin: Springer Series in Statistics, 2009.
  • Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning, with applications in R. Springer, Berlin: Springer Series in Statistics, 2014.

Course material

The course material (slides) for my portion (Medvet, 3+3 CFU) is attached at the bottom of this page.
The full pack of slides might be updated during the course.
The annotated slides will be provided after the lectures.
See also the University Videocenter for the recordings of the lectures.
Subpages (1): Student project
Ċ
Eric Medvet,
Nov 8, 2018, 6:07 AM
Ċ
lesson01.pdf
(10080k)
Eric Medvet,
Nov 6, 2018, 4:52 AM
Ċ
lesson02.pdf
(5101k)
Eric Medvet,
Nov 7, 2018, 4:20 AM
Ċ
lesson03.pdf
(2465k)
Eric Medvet,
Nov 8, 2018, 6:07 AM
Ċ
lesson04.pdf
(7925k)
Eric Medvet,
Nov 13, 2018, 5:37 AM