# diochnos/teaching/CS5970/2021F

## CS 5970 – Computational Learning Theory (Fall 2021)

### Course Description

Topics of machine learning theory. Learning using membership queries, equivalence queries, version spaces, decision trees, perceptrons. Probably approximately correct (PAC) learning, Occam algorithms, VC-dimension, sample sizes for distribution-independent learning. Representation issues, proper learning, reductions, intractability, learning in the realizable case, agnostic learning. Noise models, statistical queries, PAC learning under noise. Adversarially robust learning against poisoning attacks and against adversarial examples. Distribution-specific learning and evolvability. Online learning and learning with expert advice in the mistake bound model. Weak and strong learning (boosting).

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Mondays, Wednesdays, and Fridays, 9:30am – 10:20am, Carson Engineering Center 0121.

#### Office Hours

I will be holding my office hours at the following times.

Mondays
10:30am – 11:30am, 244 Devon Energy Hall
Wednesdays
10:30am – 11:30am, 244 Devon Energy Hall
##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

Monday, August 23, 2021: I will not hold office hours on 8/23. I have to give a talk to IJCAI-2021 at the same time.

### Homework Assignments

Assignment 1: Announced on Friday, August 27. Due on Friday, September 10.

Assignment 2: Announced on Friday, September 10. Due on Monday, September 20.

Assignment 3: To be announced on Friday, September 20. To be due on Thursday, September 30.
(Let's try to make this a firm deadline and make the midterm easier.)

### Final Projects

In general I am open to discussion for various ideas regarding these projects. The route that I expect most people will take, is to either work alone or form a group with someone else, read a research paper, and present what they read to others during the poster session that takes place at the end of the semester. Another idea is for 2 people to form a group and implement a library with some algorithms that we saw in class, run some experiments with these algorithms, and present this work during the poster session at the end of the semester.

For a list of candidate papers or book chapters to be presented at the end of the course, please have a look below. The candidate material below is split into one- and multi-person projects.

I am happy to discuss final projects on other subjects that are related to the course. For example, you may want to use a data set that you acquired online for training a classifier and then predicting its generalization ability by performing cross-validation and ultimately compare how close this empirical estimate is with the prediction that you have from PAC learning (for an appropriate tradeoff between the accuracy parameter ε and the confidence parameter δ). I am also open to other suggestions; for example one can implement Shannon's mind reading machine. However, in any case, you need to talk to me well in advance and make sure that I agree with the path that you want to follow for what you plan to deliver for your final project.

#### Candidate Material

A list with candidate papers will be posted here.

• TBA

### Computational Learning Theory Resources

#### Books

The two main books that we plan to use for the course are available for free in electronic format in the following links:

Another book that I like a lot and recommend to people who are starting with machine learning is the following one:

• TBA

### Class Log

A log for the class will be held online here.

#### Class 1 (Aug 23, 2021)

Discussion on syllabus and policies.

Discussion of coupon collector's problem. We derived the expected running time of the process of collecting all $N$ coupons.

There are several things that we can (and we will) discuss here next time.

#### Class 2 (Aug 25, 2021)

Continued our discussion on coupon collection.

Mentioned the existence of the following handouts:

#### Class 3 (Aug 27, 2021)

Started discussion on terminology that we will be using. Agnostic vs non-agnostic learning.

Concept learning and concept class.

Hypothesis space, hypothesis.

Proper learning vs `improper'/representation-independent learning. Learning in the realizable case.

Assigned today: Homework 1.

#### Class 4 (Aug 30, 2021)

Further discussion on machine learning terminology and semantics.

Discussion on the representation of Boolean functions.

#### Class 5 (Sep 1, 2021)

Defined the different types of queries that one can ask for query learning.

Learning monotone conjunctions using membership and equivalence queries. Learning general conjunctions using equivalence queries. Showed negative result (exponential bound in the worst case) when using membership queries.

Lemma 1 from Angluin's paper – a sunflower lemma.

Assigned Reading: From the paper Queries and Concept Learning, Sections 1 to 2.4, as well as Section 6 are mandatory; rest is optional.

#### Class 6 (Sep 3, 2021)

The double sunflower. Using the same concept class we can achieve an exponential lower bound for any algorithm that performs exact identification using any kind of queries.

The more-general-than-or-equal-to relation among functions. Lattice with partial ordering of hypotheses in the hypothesis space. Correspondence between hypotheses in the hypothesis space and sets of examples in the instance space.

Algorithm Find-S.

#### Sep 6, 2021

No class. Labor day.

#### Class 7 (Sep 8, 2021)

Consistency and version spaces.

Algorithm List-Then-Eliminate.

Most specific and most general boundaries.

Version space representation theorem. (Theorem 2.1 in Mitchell's book.)

Candidate Elimination Algorithm. Execution on our toy example.

#### Class 8 (Sep 10, 2021)

Due today: Homework 1.

Assigned today: Homework 2.

Using partially learned concepts to make predictions.

Inductive bias and the futility of bias-free learners.

Assigned Reading: Chapter 2, from Tom Mitchell's book.

Introduction to decision trees.

Entropy, information gain and basic idea for the construction of decision trees.

#### Class 9 (Sep 13, 2021)

Partial execution of the ID3 algorithm in order to identify the root node.

Characteristics of the search performed by ID3. Contrasting this search (and bias) of ID3 with the search performed by the Candidate-Elimination algorithm. Occam's razor.

#### Class 10 (Sep 15, 2021)

Definition of overfitting. Approaches to deal with overfitting.

Partitioning original examples to: training set, validation set, test set.

Handling overfitting: Reduced-Error Pruning and Rule Post-Pruning.

Handling continuous-valued attributes. One needs only to examine boundary points for ID3.

Discussion on missing attributes or attributes that have different costs for testing.

Assigned Reading: Chapter 3 from Tom Mitchell's book.

#### Class 11 (Sep 17, 2021)

Perceptron and geometry of the solution (hypothesis).

The perceptron update rule for learning from linearly separable data.

Assigned Reading: Sections 4.4 – 4.4.2 from Tom Mitchell's book. Feel free to also read the beginning of Chapter 4 in Tom Mitchell's book.

No free-lunch theorems.

Assigned Reading: Perceptrons (beginning of Chapter 4) from Tom Mitchell's book.

#### Class 12 (Sep 20, 2021)

Due today: Homework 2.

Assigned today: Homework 3.

Definition of PAC learning and efficient PAC learning.

Definition of agnostic PAC learning and efficient agnostic PAC learning.

Started our discussion on learning axis-aligned rectangles in the plane.

Assigned Reading: Started Chapter 2 from the book Foundations of Machine Learning. Equivalently, from the book Understanding Machine Learning, we started discussing the content of Chapters 2 and 3.

#### Class 13 (Sep 22, 2021)

Completed our discussion on PAC learning axis-aligned rectangles in the plane.

Assigned Reading: This is Example 2.4 from the book Foundations of Machine Learning. Equivalently, this is the first example in Chapter 1 in the Kearns-Vazirani book.

Efficient PAC learnability of conjunctions (using Find-S).

Optional Reading: This is Section 1.3 in the Kearns-Vazirani book.

TBA

TBA

TBA

#### Oct 1, 2020

Due today: Homework 3.

#### Class 17 (Oct 1, 2021)

Discussion of solutions on homework assignments (all 3) so far.

Preparation for the midterm.

Midterm

TBA

#### Class 20 (Oct 8, 2021)

The current plan is that no class will take place on this day. It has been designated as the 2021 Fall Student Holiday.

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

#### Class 30 (Nov 1, 2021)

Presentation on financial education by Cami Sheaffer.

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

TBA

#### Nov 24, 2021

Thanksgiving vacation; no classes.

#### Nov 26, 2021

Thanksgiving vacation; no classes.

TBA

TBA

TBA

TBA

TBA

#### Class 45 (Dec 10, 2021)

Most likely no class will be held. Instead we will have poster presentations.

#### Tuesday, December 14, 2021 (8:00am – 10:00am)

Final exam. In-class (Carson Engineering Center 0121 — where we met throughout the semester).