# diochnos/teaching/CS4033-5033/2023S

## CS 4033/5033 – Machine Learning Fundamentals (Spring 2024)

The class is cross-listed as CS 4033 and CS 5033, so that both undergraduate and graduate students can enroll simultaneously. No student may earn credit for both 4033 and 5033.

### Table of Contents

### Course Description

Topics include decision trees, relational learning, neural networks, Bayesian learning, reinforcement learning, multiple-instance learning, feature selection, learning appropriate representations, clustering, and kernel methods. No student may earn credit for both 4033 and 5033.

[Course Description] [Table of Contents] [Top]

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Mondays, Wednesdays, and Fridays, 11:30am – 12:20pm, Sarkeys Energy Ctr N0202.

#### Contact Information

Please see here.

#### Teaching Assistants

The teaching assistant for the class is Naeem Shahabi Sani.

#### Office Hours

We will be holding our office hours at the following times.

- Mondays
- 2:00pm – 3:00pm, 230 Devon Energy Hall (Dimitris)
- Tuesdays
- 11:00am – 12:00pm, 115 Devon Energy Hall (Naeem)
- Wednesdays
- 2:00pm – 3:00pm, 230 Devon Energy Hall (Dimitris)
- Thursdays
- 11:00am – 12:00pm, 115 Devon Energy Hall (Naeem)

##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

Wednesday, April 17, 2024:
I will be holding office hours between **2:30pm – 3:00pm**.
That is, **only the second half of the hour I will be available to see students** as I have
a conflict with another appointment during the first half of the hour.

[Basic Information] [Table of Contents] [Top]

### Homework Assignments

Assignment 1: Announce on Mon, Jan 22, 2024. Due on Wed, Jan 31, 2024.

Assignment 2: Announce on Mon, Feb 5, 2024. Due on Mon, Feb 19, 2024.

Assignment 3: Announce on Mon, Feb 26, 2024. Due on Fri, Mar 8, 2024.

Assignment 4: Announce on Fri, Mar 8, 2024. Due on Wed, Mar 27, 2024.

Assignment 5: Announce on Wed, Mar 27, 2024. Due on Mon, Apr 8, 2024.

Assignment 6: Announce on Mon, Apr 8, 2024. Due on Sun, Apr 28, 2024.

[Homework Assignments] [Table of Contents] [Top]

### Projects

Information related to the projects will show up here.

#### Ideas for projects

Below are some ideas for your projects.

##### Reinforcement Learning Ideas

- Gymnasium from Farama Foundation (continuation of the OpenAI Gym):
provides an interface for training your own RL agent to play a computer game.

(Try to select a simple game so that it is easier to deal with it.) - Simple board games for RL.
- Variations of bandit problems.
- A gym environment for the classic game of snake is available here.
- Avoid constraint satisfaction problems (e.g., Sudoku, Worldle, etc.)

##### Supervised Learning Ideas

Before we discuss any ideas, as a reminder, you cannot use MNIST, because that dataset has been studied with any conceivable algorithm at this point and all the information is available for free online.

- UCI repository: datasets that are available from the repository that is maintained by the University of California, Irvine.
- Make Moons: a synthetic dataset where you can also compare the performance of the models that you will develop from scratch with the equivalent ones from scikit-learn.
- Kaggle: has lots of datasets and you may actually be able to participate in a competition and see how your algorithm and your approach compares against others.
- KDnuggets: datasets from KD nuggets.
- Smiling or not?
- Fashion MNIST
- ImageNet (Though, I would suggest you work with the Tiny ImageNet)
- CIFAR-10 and CIFAR-100
- SVHN (Street-View House Numbers)
- Please note that ImageNet, CIFAR-10, CIFAR-100, and SVHN, are all difficult datasets.
- As a sidenote, if you want to reduce the dimensionality of some datasets (so that you can speed up computations), you can use dimension reduction techniques (e.g., Principal Component Analysis). You are free to use libraries exclusively for this task.

[Projects] [Table of Contents] [Top]

### Milestones

Week 2: Homework 1 is announced (beginning of week).

Week 3: Homework 1 is due (mid-week). In-class presentations for the reinforcement learning project (end of week).

Week 4: Homework 2 is announced (beginning of week). Project written proposal is due (beginning of week).

Week 6: Homework 2 is due (beginning of week). Project checkpoint is due (end of week).

Week 7: Homework 3 is announced (beginning of week).

Week 8: Homework 3 is due and homework 4 is announced (end of week).

Week 9 (**Spring Break**): Reinforcement learning project is due at the end
of week.

Week 10: In-class presentations for the supervised learning project (end of week).

Week 11: Homework 4 is due and homework 5 is announced (mid-week). Project written proposal is due (beginning of week).

Week 13: Homework 5 is due and homework 6 is announced (beginning of week). Project checkpoint is due (end of week).

Week 15: Homework 6 is due (end of week).

Week 16: Supervised learning project is due (end of week).

[Milestones] [Table of Contents] [Top]

### Machine Learning Resources

#### Books

The three books that we plan to use for the course are available for free in electronic format in the following links:

- Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto. (1st edition)
- An Introduction to Statistical Learning (with Applications in Python), by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
- Machine Learning, by Tom Mitchell.

#### Personal Notes

#### Notes by Others

- Recall, Precision, F1, ROC, AUC, and everything, by Ofir Shalev. (local pdf copy)
- Neural Networks and Deep Learning, by Michael Nielsen.

#### Papers

- Machine Learning that Matters, by Kiri L. Wagstaff
- A Few Useful Things to Know About Machine Learning, by Pedro Domingos. (alternate version)
- Perceptron-based learning algorithms, by Stephen I. Gallant. (Optional reading; this is the paper for the pocket algorithm.)

[Machine Learning Resources] [Table of Contents] [Top]

### Class Log

A log for the class will be held online here.

#### Week 1

#### Class 1 (Jan 17, 2024)

About this Course.

Discussion on syllabus and policies.

#### Class 2 (Jan 19, 2024)

Discussion on projects. Introduction to Machine Learning.

Pretest in class.

Assigned Reading: Elements of Statistical Learning (ESL), Chapter 1.

Assigned Reading: Sutton & Barto: Chapters 1 and 2.

Assigned today: Think about short and long projects. Think about the topic for your RL project.

#### Week 2

#### Classes 3-4-5 (Jan 22-24-26, 2024)

Assigned (Mon): Homework 1.

Introduction to reinforcement learning.

Basic ingredients of RL methods: policy, value function, model.

Discussion on the projects, deadlines, and various expectations. Also, where we can find certain information on Canvas.

Exploration vs Exploitation. The multi-armed bandit problem from the book (Chapter 2).

The prediction problem and the control problem.

Markov Decision Processes (MDPs).

Discussion on the Bellman Expectation Equations. Backup diagrams and solution of the prediction problem using linear algebra. Revisited the recycling robot example and we showed how we can evaluate the policy that picks an action with the same probability at each of the two energy states of the robot.

#### Week 3

#### Classes 6-7-8 (Jan 29-31 and Feb 2, 2024)

Bellman optimality equations and the control problem.

Assigned Reading (Mon, 1/30/2023): Sutton & Barto: Chapter 3.

Introduction to dynamic programming methods.

Due today: Homework 1.

Wed, 2/1/2023: Proposals for reinforcement learning projects; in-class presentations.

Fri, 2/3/2023: Proposals for reinforcement learning projects; in-class presentations.

#### Week 4

#### Classes 9-10-11 (Feb 5-7-9, 2024)

Assigned (Mon, 2/6/2023): Homework 2.

Due Mon, 2/6/2023: Written proposal for the reinforcement learning project.

Conclude discussion on dynamic programming. Discuss value iterations, as well as we had some last remarks on dynamic programming (complexity, asynchronous backups, etc.)

Assigned Reading: Sutton & Barto: Chapter 4.

Started our discussion on model-free methods that are used for prediction.

Overview of Monte-Carlo and Temporal Difference learning.

First-visit and every-visit Monte Carlo methods. Application to Blackjack.

Iterative calculation of empirical average.

Temporal difference learning. Application to the "Return Home" example.

Comparison of Monte Carlo and Temporal Difference learning.

Assigned Reading (Wed): Sutton & Barto: Sections 5.1, 5.2, 6.1, 6.2, 6.3.

n-Step Returns and Eligibility Traces. Forward view, backward view, and equivalence.

Assigned Reading: Sutton & Barto: Sections 7.1, 7.2, 12.1, 12.2.

#### Week 5

#### Classes 12-13-14 (Feb 12-14-16, 2024)

The control problem. Using $\varepsilon$-greedy policy in order to guarantee enough exploration of the state space so that we are able to calculate accurate optimal values for the value functions.

Solution with an on-policy Monte-Carlo approach.

Assigned Reading (Mon): Sutton & Barto: Sections 5.3, 5.4.

Discussion on information about the class and the second homework.

Continue our discussion on solving the control problem. This time we use the idea of TD learning which leads to Sarsa and we will also discuss the extension to Sarsa($\lambda$).

Assigned Reading (Wed): Sutton & Barto: Sections 6.4, 6.5, 12.7.

Solving the control problem using an off-policy method: Q-Learning.

Introduction to function approximation?

#### Week 6

#### Class 15 (Feb 19, 2024)

Due Mon: Homework 2.

Finish our discussion on function approximation.

How can do linear function approximation and solve the prediction and the control problem using our basic methods.

Simple ways to construct features: state aggregation, coarse coding, tile coding. The book has more examples.

Discussion of some examples with function approximation. Among them, Sarsa with linear function approximation on the Mountain Car problem.

Assigned Reading: Sutton & Barto: Sections 9.1 – 9.5, 10.1, 10.2.

#### Classes 16-17 (Feb 21-23, 2024)

Assigned today: Homework 3.

Introduction to supervised learning.

What is supervised learning? Regression and classification problems. The role of inductive bias.

Definitions on terminology that we will be using throughout the course for supervised learning.

Assigned Reading: ISL 2.1.

Further discussion on the notion of the *hypothesis space* $\mathcal{H}$.
The different algorithms that we will see on supervised learning, largely define how we will perform the seach in this
space and come up with a hypothesis $h$ that we will believe will approximate well the ground truth $c$.

Introduction to nearest neighbor learning. Example with flowers based on sepal length and sepal width.

1-Nearest Neighbor classification and connection to the Voronoi Diagram (this is a topic discussed in Computational Geometry classes).

Assigned Reading: Nearest neighbors: Mitchell 8.1, 8.2 – 8.2.3. Alternatively, ISL 2.2.3 (classification) and 3.5 (regression).

#### Sunday, February 25, 2024 (11:59pm)

Due today: Reinforcement learning checkpoint.

#### Week 7

#### Classes 18-19-20 (Feb 26-28 and Mar 1, 2024)

Continue our discussion on Nearest Neighbors. Different metrics. Application to regression problems. Distance-weighted nearest neighbor method.

Naive Bayes for classification. Example on PlayTennis.

The m-estimate and dealing with various corner cases.

Naive Bayes for document classification.

Discussion on creating features for document classification: bag-of-words, n-grams, TF-IDF.

Assigned Reading: Naive Bayes: Mitchell 6.9, 6.10. You can also have a look in ISL Section 4.4.4.

Gaussian Naive Bayes (dealing with continued-valued attributes) and other variants of Naive Bayes for classification.

Naive Bayes is not used for regression problems.

Loss functions: 0-1 loss, square loss.

The quality of our solutions: risk and empirical risk.

Assigned Reading: Please pay attention to the slides for the discussion on loss functions, risk, empirical risk, the ERM principle, etc.

Assigned Reading: You can also look into ISL Section 2.2.1.

#### Week 8

#### Classes 21-22-23 (Mar 4-6-8, 2024)

Discussion on risk and empirical risk and how these quantities look like when we use the 0-1 loss $\ell_{\text{0-1}}$ or the square loss $\ell_{\text{sq}}$. Empirical Risk Minimization (ERM) principle.

Some remarks on terminology and notation used in statistics and in computer science.

Started our discussion on linear models. Perceptrons, decision boundary, discussion on the representational power of various functions. The update rule that is used for learning using perceptrons.

Assigned Reading: Perceptrons: Mitchell 4.1 – 4.4.2.

Perceptrons on linearly separable data and non-linearly separable data. Transformations that allow the perceptron to learn non-linearly separable data and issues of these transformations. The pocket algorithm.

Linear regression. Ordinary least squares solution.

Solve linear regression problems using gradient descent.

Assigned Reading: Mitchell 4.4.3 – 4.4.4. Alternatively, ISL 3.1 – 3.3 and 3.5.

Introduction to logistic regression. How is it different from other linear models?

Introduction of the logistic loss.

Assigned Reading: Logistic regression: ISL 4 – 4.3.5.

#### Sunday, March 10, 2024

Due today: Homework 3.

#### Week 9

#### Classes 24-25-26 (Mar 11-13-15, 2024)

Conclude our discussion on the logistic loss. Termination criteria for logistic regression using gradient descent.

Almost finished with our discussion on the logistic regression.

Wed, 3/22/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

Fri, 3/24/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

#### Sunday, March 17, 2024

Due today: Reinforcement learning project.

#### Week 10

#### Mar 18-20-22, 2024

Spring break; no classes.

#### Week 11

#### Classes 27-28-29 (Mar 25-27-29, 2024)

Introduction to regularization and stability. Structural Risk Minimization (SRM) and Regularized Loss Minimization (RLM).

Regression and regularization. Ridge regression, lasso regression, and elastic-net. Why lasso is more likely to create a sparser model compared to ridge regression.

Discussion on assessing the goodness of fit of linear models. Discussed: Residual Standard Error (RSE), R^{2} statistic, adjusted
R^{2} statistic, residual plots, C_{p}, AIC, and BIC criteria.

Due Wed: Homework 4.

Assigned Wed: Homework 5.

Bias – variance tradeoff. Underfitting and overfitting and the general ideas behind model selection.

Holdout method using 2-way and 3-way partitioning of the given dataset.

Start discussion on cross-validation.

Assigned Reading: A Few Useful Things to Know About Machine Learning , by Pedro Domingos.

Discussion in class on issues that arise in supervised learning algorithms that we have covered so far: perceptron and bound on number of mistakes, stability, PAC guarantees and distributional assumptions.

Conclude our discussion on cross-validation. Leave-One-Out-Cross-Validation (LOOCV). Stratified cross-validation.

Assigned Reading: ISL Sections 2.2.2 and 2.2.3, Chapter 5, Section 6.2.

#### Week 12

#### Classes 30-31-32 (Apr 1-3-5, 2024)

Metrics Beyond Low Risk.

A nice decomposition of the instance space and ultimately of the predictions that we make in a dataset, by using false positives, false negatives, true positives, and true negatives -- regions in the instance space, or counting examples from the dataset.

The confusion matrix.

Complex performance measures based on: recall, precision, specificity. Balanced accuracy and F1-score.

Receiver Operator Characteristic (ROC) curve.

Assigned Reading: ISL second part of the discussion in Section 4.4.2.

Artificial neural networks.

Popular activation functions (and their derivatives).

The backpropagation algorithm.

Explanation of the phenomenon of *vanishing gradients*.

#### Week 13

#### Classes 33-34-35 (Apr 8-10-12, 2024)

Due Mon: Homework 5.

Assigned Mon: Homework 6.

Derivation of the backpropagation rule.

An illustrative example on artificial neural networks. Design decisions and results.

Additional comments and discussion on artificial neural networks.

Assigned Reading: Mitchell Chapter 4. ISL Sections 10.1 – 10.2, 10.6 – 10.7.

Introduction to decision trees.

#### Week 14

#### Classes 36-37-38 (Apr 15-17-19, 2024)

Continuation of decision trees.

Due Wed: Supervised Learning Project Checkpoint.

Conclusion of decision trees.

Assigned Reading: Mitchell Chapter 3. ISL Section 8.1.

#### Week 15

#### Classes 39-40-41 (Apr 22-24-26, 2024)

Ensembles. Bootstrap and bagged predictors. Random forests. Boosting methods.

Assigned Reading: ISL Section 8.2.

Introduction to Support Vector Machines. Optimal hyperplane and margin.

Assigned Reading:
Elements of Statistical Learning: Sections 12.1 – 12.2.

Note that your book defines the margin to be twice as much of
the distance between the decision boundary and the closest training example
(positive or negative it does not matter) – see Figure 12.1.
However, other books define the margin to be half of this quantity (and I tend to prefer this view),
essentially corresponding to the distance between the support vectors and the decision boundary
(thus, this distance corresponds to a cushion that exists for all the instances in our dataset
before they are misclassified).
For example, the following books follow this view:

- Foundations of Machine Learning, by Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar. See Definition 5.1 and Figure 5.3.
- An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. See Figure 9.3.
- Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman and Jeffrey David Ullman. See Figure 12.14.
- Machine Learning: An Algorithmic Perspective, by Stephen Marsland. See Figure 8.2.

Just keep this in mind when you discuss with someone else, because even if you use the same term (margin), you may end up describing a quantity that is off by a factor of 2. At the end of the day it is just a definition and it is more of a personal preference on how one defines this quantity.

Assigned Reading: ISL Sections 9.1 – 9.5.

#### Sunday, Apr 28, 2024

Due today: Homework 6.

#### Week 16

#### Classes 42-43-44 (Apr 29 and May 1-3, 2024)

Mention kernels for support vector machines.

Unsupervised learning: principal component analysis and clustering. (We may cover a subset of these depending on the pace and other plans; e.g., guest lecture.)

Assigned Reading: ISL Sections 12.1 – 12.2 and 12.4.

Potentially a guest lecture.

Devote some time in class for the student evaluations.

Advertisement of computational learning theory.

Ask me anything.

#### Friday, May 10, 2024 (1:30pm – 3:30pm)

Normally this would be the date and time of the final exam. However, we will not have a final exam as the class has a semester-long project.

The supervised learning project will be due at 3:30pm on this day.

Due today: Supervised learning project (whether short or long) write-up and source code.

[Class Log] [Table of Contents] [Top]