# diochnos/teaching/CS4033-5033/2023S

## CS 4033/5033 – Machine Learning (Spring 2023)

The class is cross-listed as CS 4033 and CS 5033, so that both undergraduate and graduate students can enroll simultaneously. No student may earn credit for both 4033 and 5033.

### Course Description

Topics include decision trees, relational learning, neural networks, Bayesian learning, reinforcement learning, multiple-instance learning, feature selection, learning appropriate representations, clustering, and kernel methods. No student may earn credit for both 4033 and 5033.

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Mondays, Wednesdays, and Fridays, 11:30am – 12:20pm, Devon Energy Hall 120.

#### Teaching Assistants

The teaching assistant for the class is Shyam Sunda Murali Krishnan.

#### Office Hours

We will be holding our office hours at the following times.

Mondays
2:00pm – 3:00pm, 230 Devon Energy Hall (Dimitris)
Tuesdays
10:00am – 11:00am, 115 Devon Energy Hall (Shyam)
Wednesdays
2:00pm – 3:00pm, 230 Devon Energy Hall (Dimitris)
Thursdays
10:00am – 11:00am, 115 Devon Energy Hall (Shyam)
##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

##### Exceptions to the Regular Schedule of Office Hours for the TAs

As exceptions appear along the way, they will also be announced here.

Wednesday, February 1, 2023: I held my office hours between 1pm-2pm (instead of 2pm-3pm) due to a conflict in my schedule.

### Homework Assignments

Assignment 1: Announce on Mon, Jan 23, 2023. Due on Wed, Feb 1, 2023.

Assignment 2: Announce on Mon, Feb 6, 2023. Due on Mon, Feb 20, 2023.

Assignment 3: Announce on Mon, Feb 27, 2023. Due on Fri, Mar 10, 2023.

Assignment 4: Announce on Fri, Mar 10, 2023. Due on Wed, Mar 29, 2023.

Assignment 5: Announce on Wed, Mar 29, 2023. Due on Mon, Apr 10, 2023.

Assignment 6: Announce on Mon, Apr 10, 2023. Due on Sun, Apr 30, 2023.

### Projects

Information related to the projects will show up here.

#### Ideas for projects

Below are some ideas for your projects.

##### Reinforcement Learning Ideas
• Gymnasium from Farama Foundation (continuation of the OpenAI Gym): provides an interface for training your own RL agent to play a computer game.
(Try to select a simple game so that it is easier to deal with it.)
• Simple board games for RL.
• Variations of bandit problems.
• A gym environment for the classic game of snake is available here.
• Avoid constraint satisfaction problems (e.g., Sudoku, Worldle, etc.)
##### Supervised Learning Ideas

Before we discuss any ideas, as a reminder, you cannot use MNIST, because that dataset has been studied with any conceivable algorithm at this point and all the information is available for free online.

### Milestones

Week 2: Homework 1 is announced (beginning of week).

Week 3: Homework 1 is due (mid-week). In-class presentations for the reinforcement learning project (end of week).

Week 4: Homework 2 is announced (beginning of week). Project written proposal is due (beginning of week).

Week 6: Homework 2 is due (beginning of week). Project checkpoint is due (end of week).

Week 7: Homework 3 is announced (beginning of week).

Week 8: Homework 3 is due and homework 4 is announced (end of week).

Week 9 (Spring Break): Reinforcement learning project is due at the end of week.

Week 10: In-class presentations for the supervised learning project (end of week).

Week 11: Homework 4 is due and homework 5 is announced (mid-week). Project written proposal is due (beginning of week).

Week 13: Homework 5 is due and homework 6 is announced (beginning of week). Project checkpoint is due (end of week).

Week 15: Homework 6 is due (end of week).

Week 16: Supervised learning project is due (end of week).

### Machine Learning Resources

#### Books

The three books that we plan to use for the course are available for free in electronic format in the following links:

### Class Log

A log for the class will be held online here.

#### Class 1 (Jan 18, 2023)

Discussion on syllabus and policies.

#### Class 2 (Jan 20, 2023)

Discussion on projects. Introduction to Machine Learning.

Pretest in class.

Assigned Reading: Elements of Statistical Learning (ESL), Chapter 1.

Assigned Reading: Sutton & Barto: Chapters 1 and 2.

#### Classes 3-4-5 (Jan 23-25-27, 2023)

Assigned (Mon): Homework 1.

Introduction to reinforcement learning.

Basic ingredients of RL methods: policy, value function, model.

Discussion on the projects, deadlines, and various expectations. Also, where we can find certain information on Canvas.

Exploration vs Exploitation. The multi-armed bandit problem from the book (Chapter 2).

The prediction problem and the control problem.

Markov Decision Processes (MDPs).

Discussion on the Bellman Expectation Equations. Backup diagrams and solution of the prediction problem using linear algebra. Revisited the recycling robot example and we showed how we can evaluate the policy that picks an action with the same probability at each of the two energy states of the robot.

#### Classes 6-7-8 (Jan 30 and Feb 1-3, 2023)

Bellman optimality equations and the control problem.

Assigned Reading (Mon, 1/30/2023): Sutton & Barto: Chapter 3.

Introduction to dynamic programming methods.

Due today: Homework 1.

Wed, 2/1/2023: Proposals for reinforcement learning projects; in-class presentations.

Fri, 2/3/2023: Proposals for reinforcement learning projects; in-class presentations.

#### Classes 9-10-11 (Feb 6-8-10, 2023)

Assigned (Mon, 2/6/2023): Homework 2.

Due Mon, 2/6/2023: Written proposal for the reinforcement learning project.

Conclude discussion on dynamic programming. Discuss value iterations, as well as we had some last remarks on dynamic programming (complexity, asynchronous backups, etc.)

Assigned Reading: Sutton & Barto: Chapter 4.

Started our discussion on model-free methods that are used for prediction.

Overview of Monte-Carlo and Temporal Difference learning.

First-visit and every-visit Monte Carlo methods. Application to Blackjack.

Iterative calculation of empirical average.

Temporal difference learning. Application to the "Return Home" example.

Comparison of Monte Carlo and Temporal Difference learning.

Assigned Reading (Wed): Sutton & Barto: Sections 5.1, 5.2, 6.1, 6.2, 6.3.

n-Step Returns and Eligibility Traces. Forward view, backward view, and equivalence.

Assigned Reading: Sutton & Barto: Sections 7.1, 7.2, 12.1, 12.2.

#### Classes 12-13-14 (Feb 13-15-17, 2023)

The control problem. Using $\varepsilon$-greedy policy in order to guarantee enough exploration of the state space so that we are able to calculate accurate optimal values for the value functions.

Solution with an on-policy Monte-Carlo approach.

Assigned Reading (Mon): Sutton & Barto: Sections 5.3, 5.4.

Discussion on information about the class and the second homework.

Continue our discussion on solving the control problem. This time we use the idea of TD learning which leads to Sarsa and we will also discuss the extension to Sarsa($\lambda$).

Assigned Reading (Wed): Sutton & Barto: Sections 6.4, 6.5, 12.7.

Solving the control problem using an off-policy method: Q-Learning.

Introduction to function approximation?

#### Class 15 (Feb 20, 2023)

Due Mon: Homework 2.

Finish our discussion on function approximation.

How can do linear function approximation and solve the prediction and the control problem using our basic methods.

Simple ways to construct features: state aggregation, coarse coding, tile coding. The book has more examples.

Discussion of some examples with function approximation. Among them, Sarsa with linear function approximation on the Mountain Car problem.

Assigned Reading: Sutton & Barto: Sections 9.1 – 9.5, 10.1, 10.2.

#### Classes 16-17 (Feb 22-24, 2023)

Assigned today: Homework 3.

Introduction to supervised learning.

What is supervised learning? Regression and classification problems. The role of inductive bias.

Definitions on terminology that we will be using throughout the course for supervised learning.

Further discussion on the notion of the hypothesis space $\mathcal{H}$. The different algorithms that we will see on supervised learning, largely define how we will perform the seach in this space and come up with a hypothesis $h$ that we will believe will approximate well the ground truth $c$.

Introduction to nearest neighbor learning. Example with flowers based on sepal length and sepal width.

1-Nearest Neighbor classification and connection to the Voronoi Diagram (this is a topic discussed in Computational Geometry classes).

Assigned Reading: Nearest neighbors: Mitchell 8.1, 8.2 – 8.2.3. Alternatively, ISL 2.2.3 (classification) and 3.5 (regression).

#### Sunday, February 26, 2023 (11:59pm)

Due today: Reinforcement learning checkpoint.

#### Classes 18-19-20 (Feb 27 and Mar 1-3, 2023)

Continue our discussion on Nearest Neighbors. Different metrics. Application to regression problems. Distance-weighted nearest neighbor method.

Naive Bayes for classification. Example on PlayTennis.

The m-estimate and dealing with various corner cases.

Naive Bayes for document classification.

Discussion on creating features for document classification: bag-of-words, n-grams, TF-IDF.

Assigned Reading: Naive Bayes: Mitchell 6.9, 6.10. You can also have a look in ISL Section 4.4.4.

Gaussian Naive Bayes (dealing with continued-valued attributes) and other variants of Naive Bayes for classification.

Naive Bayes is not used for regression problems.

Loss functions: 0-1 loss, square loss.

The quality of our solutions: risk and empirical risk.

Assigned Reading: Please pay attention to the slides for the discussion on loss functions, risk, empirical risk, the ERM principle, etc.

Assigned Reading: You can also look into ISL Section 2.2.1.

#### Classes 21-22-23 (Mar 6-8-10, 2023)

Discussion on risk and empirical risk and how these quantities look like when we use the 0-1 loss $\ell_{\text{0-1}}$ or the square loss $\ell_{\text{sq}}$. Empirical Risk Minimization (ERM) principle.

Some remarks on terminology and notation used in statistics and in computer science.

Started our discussion on linear models. Perceptrons, decision boundary, discussion on the representational power of various functions. The update rule that is used for learning using perceptrons.

Assigned Reading: Perceptrons: Mitchell 4.1 – 4.4.2.

Perceptrons on linearly separable data and non-linearly separable data. Transformations that allow the perceptron to learn non-linearly separable data and issues of these transformations. The pocket algorithm.

Linear regression. Ordinary least squares solution.

Solve linear regression problems using gradient descent.

Assigned Reading: Mitchell 4.4.3 – 4.4.4. Alternatively, ISL 3.1 – 3.3 and 3.5.

Introduction to logistic regression. How is it different from other linear models?

Introduction of the logistic loss.

Assigned Reading: Logistic regression: ISL 4 – 4.3.5.

#### Sunday, March 12, 2023

Due today: Homework 3.

#### Mar 13-15-17, 2023

Spring break; no classes.

#### Sunday, March 19, 2023

Due today: Reinforcement learning project.

#### Classes 24-25-26 (Mar 20-22-24, 2023)

Conclude our discussion on the logistic loss. Termination criteria for logistic regression using gradient descent.

Almost finished with our discussion on the logistic regression.

Wed, 3/22/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

Fri, 3/24/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

#### Classes 27-28-29 (Mar 27-29-31, 2023)

Introduction to regularization and stability. Structural Risk Minimization (SRM) and Regularized Loss Minimization (RLM).

Regression and regularization. Ridge regression, lasso regression, and elastic-net. Why lasso is more likely to create a sparser model compared to ridge regression.

Discussion on assessing the goodness of fit of linear models. Discussed: Residual Standard Error (RSE), R2 statistic, adjusted R2 statistic, residual plots, Cp, AIC, and BIC criteria.

Due Wed: Homework 4.

Assigned Wed: Homework 5.

Bias – variance tradeoff. Underfitting and overfitting and the general ideas behind model selection.

Holdout method using 2-way and 3-way partitioning of the given dataset.

Start discussion on cross-validation.

Assigned Reading: A Few Useful Things to Know About Machine Learning , by Pedro Domingos.

Discussion in class on issues that arise in supervised learning algorithms that we have covered so far: perceptron and bound on number of mistakes, stability, PAC guarantees and distributional assumptions.

Conclude our discussion on cross-validation. Leave-One-Out-Cross-Validation (LOOCV). Stratified cross-validation.

Assigned Reading: ISL Sections 2.2.2 and 2.2.3, Chapter 5, Section 6.2.

#### Classes 30-31-32 (Apr 3-5-7, 2023)

Metrics Beyond Low Risk.

A nice decomposition of the instance space and ultimately of the predictions that we make in a dataset, by using false positives, false negatives, true positives, and true negatives -- regions in the instance space, or counting examples from the dataset.

The confusion matrix.

Complex performance measures based on: recall, precision, specificity. Balanced accuracy and F1-score.

Assigned Reading: ISL second part of the discussion in Section 4.4.2.

Artificial neural networks.

Popular activation functions (and their derivatives).

The backpropagation algorithm.

Explanation of the phenomenon of vanishing gradients.

#### Classes 33-34-35 (Apr 10-12-14, 2023)

Due Mon: Homework 5.

Assigned Mon: Homework 6.

Derivation of the backpropagation rule.

An illustrative example on artificial neural networks. Design decisions and results.

Assigned Reading: Mitchell Chapter 4. ISL Sections 10.1 – 10.2, 10.6 – 10.7.

Introduction to decision trees.

#### Classes 36-37-38 (Apr 17-19-21, 2023)

Continuation of decision trees.

Due Wed: Supervised Learning Project Checkpoint.

Conclusion of decision trees.

Assigned Reading: Mitchell Chapter 3. ISL Section 8.1.

#### Classes 39-40-41 (Apr 24-26-28, 2023)

Ensembles. Bootstrap and bagged predictors. Random forests. Boosting methods.

Introduction to Support Vector Machines. Optimal hyperplane and margin.

Assigned Reading: Elements of Statistical Learning: Sections 12.1 – 12.2.
Note that your book defines the margin to be twice as much of the distance between the decision boundary and the closest training example (positive or negative it does not matter) – see Figure 12.1. However, other books define the margin to be half of this quantity (and I tend to prefer this view), essentially corresponding to the distance between the support vectors and the decision boundary (thus, this distance corresponds to a cushion that exists for all the instances in our dataset before they are misclassified). For example, the following books follow this view:

Just keep this in mind when you discuss with someone else, because even if you use the same term (margin), you may end up describing a quantity that is off by a factor of 2. At the end of the day it is just a definition and it is more of a personal preference on how one defines this quantity.

Assigned Reading: ISL Sections 9.1 – 9.5.

#### Sunday, Apr 30, 2023

Due today: Homework 6.

#### Classes 42-43-44 (May 1-3-5, 2023)

Mention kernels for support vector machines.

Unsupervised learning: principal component analysis and clustering. (We may cover a subset of these depending on the pace and other plans; e.g., guest lecture.)

Assigned Reading: ISL Sections 12.1 – 12.2 and 12.4.

Potentially a guest lecture.

Devote some time in class for the student evaluations.