# diochnos/teaching/CS4033-5033/2023S

## CS 4033/5033 – Machine Learning (Spring 2023)

The class is cross-listed as CS 4033 and CS 5033, so that both undergraduate and graduate students can enroll simultaneously. No student may earn credit for both 4033 and 5033.

### Table of Contents

### Course Description

Topics include decision trees, relational learning, neural networks, Bayesian learning, reinforcement learning, multiple-instance learning, feature selection, learning appropriate representations, clustering, and kernel methods. No student may earn credit for both 4033 and 5033.

[Course Description] [Table of Contents] [Top]

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Mondays, Wednesdays, and Fridays, 11:30am – 12:20pm, Devon Energy Hall 120.

#### Contact Information

Please see here.

#### Teaching Assistants

The teaching assistant for the class is Shyam Sunda Murali Krishnan.

#### Office Hours

TBA

##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

##### Exceptions to the Regular Schedule of Office Hours for the TAs

As exceptions appear along the way, they will also be announced here.

[Basic Information] [Table of Contents] [Top]

### Homework Assignments

Assignment 1: Announce on Mon, Jan 23, 2023. Due on Wed, Feb 1, 2023.

Assignment 2: Announce on Mon, Feb 6, 2023. Due on Mon, Feb 20, 2023.

Assignment 3: Announce on Mon, Feb 27, 2023. Due on Fri, Mar 10, 2023.

Assignment 4: Announce on Fri, Mar 10, 2023. Due on Wed, Mar 29, 2023.

Assignment 5: Announce on Wed, Mar 29, 2023. Due on Mon, Apr 10, 2023.

Assignment 6: Announce on Mon, Apr 10, 2023. Due on Sun, Apr 30, 2023.

[Homework Assignments] [Table of Contents] [Top]

### Projects

Information related to the projects will show up here.

#### Ideas for projects

Below are some ideas for your projects.

##### Reinforcement Learning Ideas

- OpenAI Gym:
provides an interface for training your own RL agent to play a computer game.

(Try to select a simple game so that it is easier to deal with it.) - Simple board games for RL.
- Variations of bandit problems.

##### Supervised Learning Ideas

- Kaggle: has lots of datasets and you may actually be able to participate in a competition and see how your algorithm and your approach compares against others.
- KDnuggets: datasets from KD nuggets.
- UCI repository: datasets that are available from the repository that is maintained by the University of California, Irvine.
- CIFAR-10 and CIFAR-100
- ImageNet (Though, I would suggest you work with the Tiny ImageNet)
- SVHN (Street-View House Numbers)

[Projects] [Table of Contents] [Top]

### Milestones

Week 2: Homework 1 is announced (beginning of week).

Week 3: Homework 1 is due (mid-week). In-class presentations for the reinforcement learning project (end of week).

Week 4: Homework 2 is announced (beginning of week). Project written proposal is due (beginning of week).

Week 6: Homework 2 is due (beginning of week). Project checkpoint is due (end of week).

Week 7: Homework 3 is announced (beginning of week).

Week 8: Homework 3 is due and homework 4 is announced (end of week).

Week 9 (**Spring Break**): Reinforcement learning project is due at the end
of week.

Week 10: In-class presentations for the supervised learning project (end of week).

Week 11: Homework 4 is due and homework 5 is announced (mid-week). Project written proposal is due (beginning of week).

Week 13: Homework 5 is due and homework 6 is announced (beginning of week). Project checkpoint is due (end of week).

Week 15: Homework 6 is due (end of week).

Week 16: Supervised learning project is due (end of week).

[Milestones] [Table of Contents] [Top]

### Machine Learning Resources

#### Books

The three books that we plan to use for the course are available for free in electronic format in the following links:

- Reinforcement Learning: An Introduction, by Richard S. Sutton and Andrew G. Barto. (1st edition)
- An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
- Machine Learning, by Tom Mitchell.

#### Personal Notes

#### Notes by Others

- Recall, Precision, F1, ROC, AUC, and everything, by Ofir Shalev. (local pdf copy)
- Neural Networks and Deep Learning, by Michael Nielsen.

#### Papers

- Machine Learning that Matters, by Kiri L. Wagstaff
- A Few Useful Things to Know About Machine Learning, by Pedro Domingos. (alternate version)
- Perceptron-based learning algorithms, by Stephen I. Gallant. (Optional reading; this is the paper for the pocket algorithm.)

[Machine Learning Resources] [Table of Contents] [Top]

### Class Log

A log for the class will be held online here.

#### Week 1

#### Class 1 (Jan 18, 2023)

About this Course.

Discussion on syllabus and policies.

#### Class 2 (Jan 20, 2023)

Discussion on projects. Introduction to Machine Learning.

Pretest in class.

Assigned Reading: Elements of Statistical Learning (ESL), Chapter 1.

Assigned Reading: Sutton & Barto: Chapters 1 and 2.

Assigned today: Think about short and long projects. Think about the topic for your RL project.

#### Week 2

#### Classes 3-4-5 (Jan 23-25-27, 2023)

Assigned (Mon): Homework 1.

Introduction to reinforcement learning.

Basic ingredients of RL methods: policy, value function, model.

Discussion on the projects, deadlines, and various expectations. Also, where we can find certain information on Canvas.

Exploration vs Exploitation. The multi-armed bandit problem from the book (Chapter 2).

The prediction problem and the control problem.

Markov Decision Processes (MDPs).

Discussion on the Bellman Expectation Equations. Backup diagrams and solution of the prediction problem using linear algebra. Revisited the recycling robot example and we showed how we can evaluate the policy that picks an action with the same probability at each of the two energy states of the robot.

#### Week 3

#### Classes 6-7-8 (Jan 30 and Feb 1-3, 2023)

Bellman optimality equations and the control problem.

Assigned Reading (Mon, 1/30/2023): Sutton & Barto: Chapter 3.

Introduction to dynamic programming methods.

Due today: Homework 1.

Wed, 2/1/2023: Proposals for reinforcement learning projects; in-class presentations.

Fri, 2/3/2023: Proposals for reinforcement learning projects; in-class presentations.

#### Week 4

#### Classes 9-10-11 (Feb 6-8-10, 2023)

Assigned (Mon, 2/6/2023): Homework 2.

Due Mon, 2/6/2023: Written proposal for the reinforcement learning project.

Conclude discussion on dynamic programming. Discuss value iterations, as well as we had some last remarks on dynamic programming (complexity, asynchronous backups, etc.)

Assigned Reading: Sutton & Barto: Chapter 4.

Started our discussion on model-free methods that are used for prediction.

Overview of Monte-Carlo and Temporal Difference learning.

First-visit and every-visit Monte Carlo methods. Application to Blackjack.

Iterative calculation of empirical average.

Temporal difference learning. Application to the "Return Home" example.

Comparison of Monte Carlo and Temporal Difference learning.

Assigned Reading (Wed): Sutton & Barto: Sections 5.1, 5.2, 6.1, 6.2, 6.3.

n-Step Returns and Eligibility Traces. Forward view, backward view, and equivalence.

Assigned Reading: Sutton & Barto: Sections 7.1, 7.2, 12.1, 12.2.

#### Week 5

#### Classes 12-13-14 (Feb 13-15-17, 2023)

The control problem. Using $\varepsilon$-greedy policy in order to guarantee enough exploration of the state space so that we are able to calculate accurate optimal values for the value functions.

Solution with an on-policy Monte-Carlo approach.

Assigned Reading (Mon): Sutton & Barto: Sections 5.3, 5.4.

Discussion on information about the class and the second homework.

Continue our discussion on solving the control problem. This time we use the idea of TD learning which leads to Sarsa and we will also discuss the extension to Sarsa($\lambda$).

Assigned Reading (Wed): Sutton & Barto: Sections 6.4, 6.5, 12.7.

Solving the control problem using an off-policy method: Q-Learning.

Introduction to function approximation?

#### Week 6

#### Class 15 (Feb 20, 2023)

Due Mon: Homework 2.

Finish our discussion on function approximation.

How can do linear function approximation and solve the prediction and the control problem using our basic methods.

Simple ways to construct features: state aggregation, coarse coding, tile coding. The book has more examples.

Discussion of some examples with function approximation. Among them, Sarsa with linear function approximation on the Mountain Car problem.

Assigned Reading: Sutton & Barto: Sections 9.1 – 9.5, 10.1, 10.2.

#### Classes 16-17 (Feb 22-24, 2023)

Assigned today: Homework 3.

Introduction to supervised learning.

What is supervised learning? Regression and classification problems. The role of inductive bias.

Definitions on terminology that we will be using throughout the course for supervised learning.

Assigned Reading: ESL ...

Further discussion on the notion of the *hypothesis space* $\mathcal{H}$.
The different algorithms that we will see on supervised learning, largely define how we will perform the seach in this
space and come up with a hypothesis $h$ that we will believe will approximate well the ground truth $c$.

Introduction to nearest neighbor learning. Example with flowers based on sepal length and sepal width.

1-Nearest Neighbor classification and connection to the Voronoi Diagram (this is a topic discussed in Computational Geometry classes).

Assigned Reading: Nearest neighbors: ESL 2.3.2, 13.1, 13.2. Alternatively, Mitchell 8.1, 8.2 – 8.2.3.

#### Sunday, February 26, 2023 (11:59pm)

Due today: Reinforcement learning checkpoint.

#### Week 7

#### Classes 18-19-20 (Feb 27 and Mar 1-3, 2023)

Continue our discussion on Nearest Neighbors. Different metrics. Application to regression problems. Distance-weighted nearest neighbor method.

Naive Bayes for classification. Example on PlayTennis.

The m-estimate and dealing with various corner cases.

Assigned Reading: Naive Bayes: Mitchell 6.9, 6.10.

Naive Bayes for document classification.

Discussion on creating features for document classification: bag-of-words, n-grams, TF-IDF.

Gaussian Naive Bayes (dealing with continued-valued attributes) and other variants of Naive Bayes for classification.

Naive Bayes is not used for regression problems.

Loss functions: 0-1 loss, square loss.

The quality of our solutions: risk and empirical risk.

Assigned Reading: Please pay attention to the slides for the discussion on loss functions, risk, empirical risk, the ERM principle, etc.

#### Week 8

#### Classes 21-22-23 (Mar 6-8-10, 2023)

Discussion on risk and empirical risk and how these quantities look like when we use the 0-1 loss $\ell_{\text{0-1}}$ or the square loss $\ell_{\text{sq}}$. Empirical Risk Minimization (ERM) principle.

Some remarks on terminology and notation used in statistics and in computer science.

Started our discussion on linear models. Perceptrons, decision boundary, discussion on the representational power of various functions. The update rule that is used for learning using perceptrons.

Assigned Reading: Perceptrons: Mitchell 4.1 – 4.4.2.

Perceptrons on linearly separable data and non-linearly separable data. Transformations that allow the perceptron to learn non-linearly separable data and issues of these transformations. The pocket algorithm.

Linear regression. Ordinary least squares solution.

Solve linear regression problems using gradient descent.

Assigned Reading: Mitchell 4.4.3 – 4.4.4.

Introduction to logistic regression. How is it different from other linear models?

Introduction of the logistic loss.

Assigned Reading: Logistic regression: ESL 4.4.

#### Sunday, March 12, 2023

Due today: Homework 3.

#### Week 9

#### Mar 13-15-17, 2023

Spring break; no classes.

#### Sunday, March 19, 2023

Due today: Reinforcement learning project.

#### Week 10

#### Classes 24-25-26 (Mar 20-22-24, 2023)

Conclude our discussion on the logistic loss. Termination criteria for logistic regression using gradient descent.

Almost finished with our discussion on the logistic regression.

Wed, 3/22/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

Fri, 3/24/2023: Proposals for supervised learning project projects; in-class as well as remote presentations.

#### Week 11

#### Classes 27-28-29 (Mar 27-29-31, 2023)

Introduction to regularization and stability. Structural Risk Minimization (SRM) and Regularized Loss Minimization (RLM).

Regression and regularization. Ridge regression, lasso regression, and elastic-net. Why lasso is more likely to create a sparser model compared to ridge regression.

Discussion on assessing the goodness of fit of linear models. Discussed: Residual Standard Error (RSE), R^{2} statistic, adjusted
R^{2} statistic, residual plots, C_{p}, AIC, and BIC criteria.

Due Wed: Homework 4.

Assigned Wed: Homework 5.

Bias – variance tradeoff. Underfitting and overfitting and the general ideas behind model selection.

Holdout method using 2-way and 3-way partitioning of the given dataset.

Start discussion on cross-validation.

Assigned Reading: A Few Useful Things to Know About Machine Learning , by Pedro Domingos.

Discussion in class on issues that arise in supervised learning algorithms that we have covered so far: perceptron and bound on number of mistakes, stability, PAC guarantees and distributional assumptions.

Conclude our discussion on cross-validation. Leave-One-Out-Cross-Validation (LOOCV). Stratified cross-validation.

#### Week 12

#### Classes 30-31-32 (Apr 3-5-7, 2023)

Metrics Beyond Low Risk.

A nice decomposition of the instance space and ultimately of the predictions that we make in a dataset, by using false positives, false negatives, true positives, and true negatives -- regions in the instance space, or counting examples from the dataset.

The confusion matrix.

Complex performance measures based on: recall, precision, specificity. Balanced accuracy and F1-score.

Receiver Operator Characteristic (ROC) curve.

Artificial neural networks.

Popular activation functions (and their derivatives).

The backpropagation algorithm.

Explanation of the phenomenon of *vanishing gradients*.

#### Week 13

#### Classes 33-34-35 (Apr 10-12-14, 2023)

Due Mon: Homework 5.

Assigned Mon: Homework 6.

Derivation of the backpropagation rule.

An illustrative example on artificial neural networks. Design decisions and results.

Additional comments and discussion on artificial neural networks.

Introduction to decision trees.

#### Week 14

#### Classes 36-37-38 (Apr 17-19-21, 2023)

Continuation of decision trees.

Due Wed: Supervised Learning Project Checkpoint.

Conclusion of decision trees.

#### Week 15

#### Classes 39-40-41 (Apr 24-26-28, 2023)

Ensembles. Bootstrap and bagged predictors. Random forests. Boosting methods.

Introduction to Support Vector Machines. Optimal hyperplane and margin.

Assigned Reading:
Elements of Statistical Learning: Sections 12.1 – 12.2.

Note that your book defines the margin to be twice as much of
the distance between the decision boundary and the closest training example
(positive or negative it does not matter) – see Figure 12.1.
However, other books define the margin to be half of this quantity (and I tend to prefer this view),
essentially corresponding to the distance between the support vectors and the decision boundary
(thus, this distance corresponds to a cushion that exists for all the instances in our dataset
before they are misclassified).
For example, the following books follow this view:

- Foundations of Machine Learning, by Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar. See Definition 5.1 and Figure 5.3.
- An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. See Figure 9.3.
- Mining of Massive Datasets, by Jure Leskovec, Anand Rajaraman and Jeffrey David Ullman. See Figure 12.14.
- Machine Learning: An Algorithmic Perspective, by Stephen Marsland. See Figure 8.2.

Just keep this in mind when you discuss with someone else, because even if you use the same term (margin), you may end up describing a quantity that is off by a factor of 2. At the end of the day it is just a definition and it is more of a personal preference on how one defines this quantity.

#### Sunday, Apr 30, 2023

Due today: Homework 6.

#### Week 16

#### Classes 42-43-44 (May 1-3-5, 2023)

Clustering.

Potentially a guest lecture.

Mention kernels for support vector machines.

Devote some time in class for the student evaluations.

Advertisement of computational learning theory.

Ask me anything.

#### Friday, May 12, 2023 (1:30pm – 3:30pm)

Normally this would be the date and time of the final exam. However, we will not have a final exam as the class has a semester-long project.

The supervised learning project will be due at 3:30pm on this day.

Due today: Supervised learning project (whether short or long) write-up and source code.

[Class Log] [Table of Contents] [Top]