# diochnos/teaching/CS5970/2020F

## CS 5970 – Computational Learning Theory (Fall 2020)

### Course Description

Topics of machine learning theory. Learning using membership queries, equivalence queries, version spaces, decision trees, perceptrons. Probably approximately correct (PAC) learning, Occam algorithms, VC-dimension, sample sizes for distribution-independent learning. Representation issues, proper learning, reductions, intractability, learning in the realizable case, agnostic learning. Noise models, statistical queries, PAC learning under noise. Adversarially robust learning against poisoning attacks and against adversarial examples. Distribution-specific learning and evolvability. Online learning and learning with expert advice in the mistake bound model. Weak and strong learning (boosting).

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Mondays, Wednesdays, and Fridays, 9:30am – 10:20am, Devon Energy Hall 0120.

#### Office Hours

I will be holding my office hours at the following times.

Mondays
10:30am – 12:30pm, 244 Devon Energy Hall
Wednesdays
10:30am – 12:30pm, 244 Devon Energy Hall

Please note that while anyone is welcome during the entire 2-hour time period on Mondays and Wednesdays, CS 5970 will have precedence during the first hour (10:30am – 11:30am) and CS 4033/5033 will have precedence during the second hour (11:30am – 12:30pm).

##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

Thursday, September 3, 2020: I will be holding office hours between 10:00am – 12:00pm. My regular office hours that were planned for Wednesday, September 2, 2020 are canceled. Please see Canvas for the Zoom link for these makeup office hours.

Thursday, September 10, 2020: I will be holding office hours between 10:00am – 12:00pm. My regular office hours that were planned for Wednesday, September 9, 2020 are canceled. Please see Canvas for the Zoom link for these makeup office hours.

Wednesday, September 16, 2020: I will be holding my office hours between 10:30am – 12:00pm; that is, 30 minutes less today.

### Homework Assignments

Assignment 1: Announced Friday, August 28. Due Friday, September 11.

Assignment 2: Announced Friday, September 11. Due Monday, September 21.

Assignment 3: Announced Tuesday, September 22. Due Thursday, October 1.

Assignment 4: Announced Monday, October 12. Due Wednesday, October 21.

### Final Projects

In general I am open to discussion for various ideas regarding these projects. The route that I expect most people will take, is to either work alone or form a group with someone else, read a research paper, and present what they read to others during the poster session that takes place at the end of the semester. Another idea is for 2 people to form a group and implement a library with some algorithms that we saw in class, run some experiments with these algorithms, and present this work during the poster session at the end of the semester.

For a list of candidate papers or book chapters to be presented at the end of the course, please have a look below. The candidate material below is split into one- and multi-person projects.

I am happy to discuss final projects on other subjects that are related to the course. For example, you may want to use a data set that you acquired online for training a classifier and then predicting its generalization ability by performing cross-validation and ultimately compare how close this empirical estimate is with the prediction that you have from PAC learning (for an appropriate tradeoff between the accuracy parameter ε and the confidence parameter δ). I am also open to other suggestions; for example one can implement Shannon's mind reading machine. However, in any case, you need to talk to me well in advance and make sure that I agree with the path that you want to follow for what you plan to deliver for your final project.

#### Candidate Material

Clicking on the title of a paper takes you to the paper. It might be easier if you first download the pdf file locally to your computer.

### Computational Learning Theory Resources

#### Books

The two main books that we plan to use for the course are available for free in electronic format in the following links:

Another book that I like a lot and recommend to people who are starting with machine learning is the following one:

### Class Log

A log for the class will be held online here.

#### Class 1 (Aug 24, 2020)

Discussion on syllabus and policies.

Mentioned the existence of the following handouts:

Discussion of coupon collector's problem. We derived the expected running time and gave an upper bound on the running time (with high probability) using Markov's inequality.

There is still discussion that we should do here next time.

#### Class 2 (Aug 26, 2020)

Continued our discussion on coupon collection.

Started discussion on terminology that we will be using. Agnostic vs non-agnostic learning.

Concept learning and concept class.

#### Class 3 (Aug 28, 2020)

Assigned today: Homework 1.

Continued our discussion on terminology. Hypothesis space, hypothesis.

Proper learning vs `improper'/representation-independent learning. Learning in the realizable case.

Defined the different types of queries that one can ask for query learning.

#### Class 4 (Aug 31, 2020)

Clarified some points on the representation of functions.

Learning monotone conjunctions using membership and equivalence queries. Learning general conjunctions using equivalence queries. Showed negative result (exponential bound in the worst case) when using membership queries.

Lemma 1 from Angluin's paper – a sunflower lemma.

#### Class 5 (Sep 2, 2020)

The double sunflower. Using the same concept class we can achieve an exponential lower bound for any algorithm that performs exact identification using any kind of queries.

Assigned Reading: From the paper Queries and Concept Learning, Sections 1 to 2.4, as well as Section 6 are mandatory; rest is optional.

The more-general-than-or-equal-to relation among functions. Lattice with partial ordering of hypotheses in the hypothesis space. Correspondence between hypotheses in the hypothesis space and sets of examples in the instance space.

#### Class 6 (Sep 4, 2020)

Consistency and version spaces.

Algorithms Find-S and List-Then-Eliminate.

Most specific and most general boundaries.

Version space representation theorem. (Theorem 2.1 in Mitchell's book.)

#### Sep 7, 2020

No class. Labor day.

#### Class 7 (Sep 9, 2020)

Candidate Elimination Algorithm. Execution on our toy example.

Discussion on the convergence of Candidate Elimination Algorithm (no noise and realizability assumption).

What queries the learners should request next and the Halving algorithm.

#### Class 8 (Sep 11, 2020)

Due today: Homework 1.

Assigned today: Homework 2.

Using partially learned concepts to make predictions.

Inductive bias and the futility of bias-free learners.

Assigned Reading: Chapter 2, from Tom Mitchell's book.

Introduction to decision trees.

Entropy, information gain and description of ID3 algorithm for decision tree learning.

#### Class 9 (Sep 14, 2020)

Partial execution of the ID3 algorithm in order to identify the root node.

Characteristics of the search performed by ID3. Contrasting this search (and bias) of ID3 with the search performed by the Candidate-Elimination algorithm. Occam's razor.

Definition of overfitting. Approaches to deal with overfitting.

Partitioning original examples to: training set, validation set, test set.

#### Class 10 (Sep 16, 2020)

Handling overfitting: Reduced-Error Pruning and Rule Post-Pruning.

Handling continuous-valued attributes. One needs only to examine boundary points for ID3.

Discussion on missing attributes or attributes that have different costs for testing.

Assigned Reading: Chapter 3 from Tom Mitchell's book.

Perceptron and geometry of the solution (hypothesis).

The perceptron update rule for learning from linearly separable data.

Assigned Reading: Sections 4.4 – 4.4.2 from Tom Mitchell's book. Feel free to also read the beginning of Chapter 4 in Tom Mitchell's book.

#### Class 11 (Sep 18, 2020)

Discussion for a few minutes on the perceptron.

No free-lunch theorems.

Definition of PAC learning and efficient PAC learning.

Assigned Reading: Started Chapter 2 from the book Foundations of Machine Learning. Equivalently, from the book Understanding Machine Learning, we started discussing the content of Chapters 2 and 3.

#### Class 12 (Sep 21, 2020)

Due today: Homework 2.

Definition of agnostic PAC learning and efficient agnostic PAC learning.

Efficient PAC learnability of axis-aligned rectangles in the plane (using Find-S).

Assigned Reading: This is Example 2.4 from the book Foundations of Machine Learning. Equivalently, this is the first example in Chapter 1 in the Kearns-Vazirani book.

#### Sep 22, 2020

Assigned today: Homework 3.

#### Class 13 (Sep 23, 2020)

Efficient PAC learnability of conjunctions (using Find-S).

Assigned Reading: This is Section 1.3 in the Kearns-Vazirani book.

#### Class 14 (Sep 25, 2020)

True risk and empirical risk.

Consistent learners and generalization. Theorem for sample complexity of PAC learning using finite hypothesis spaces (in the consistency model).

Assigned Reading: Sections 7.1, 7.2, and 7.3 (up until 7.3.1 but not 7.3.1 yet) from Tom Mitchell's book.

Optional Reading: Occam's Razor, by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth.

#### Class 15 (Sep 28, 2020)

Some clarifications on the perceptron, so that we can have decision boundaries that do not go through the origin of axes.

Discussed some implications of the theorem that we proved last time; e.g., we improved the sample complexity of PAC learning conjunctions without even arguing about the algorithm that can be used for learning. We also saw a specific example of plugging-in certain values for ε and δ.

Went through the Computational Learning Theory Overview slides and briefly discussed how we plan to extend our theorem for finite hypothesis spaces to situations where a hypothesis space is infinite.

We then starting going through the slides on Elements of Computational Learning Theory that summarize with more details some of the things that we have seen so far (no free-lunch theorems, functions that are used as toy examples in computational learning theory, sample size for finite hypothesis spaces) and started our discussion on k-term DNFs where we want to prove the intractability of efficiently (properly) PAC learning k-term DNFs. We determined the sample size that follows immediately from our theorem for finite concept classes and next time we will discuss the reduction.

#### Class 16 (Sep 30, 2020)

Intractability of learning 3-term DNF formulae.

Assigned Reading: This is discussed in Section 1.4 in the Kearns-Vazirani Book. Have a look in the notes that we have in the Elements of Computational Learning Theory slides between pages 24/48 and 29/48 (numbering shown on bottom right part of each slide).

#### Oct 1, 2020

Due today: Homework 3.

#### Class 17 (Oct 2, 2020)

Went through the solutions for homework assignments 1 and 2.

#### Class 18 (Oct 5, 2020)

Midterm.

Midterm was pushed back to Friday, October 9.

Efficiently PAC learning k-term DNF formulae using k-CNF formulae. This was a reduction that allowed us to solve a problem using an algorithm (Find-S) for solving another problem (simple conjunctions). So, the representation of the hypotheses is very important in PAC learning, and whereas it is hard to PAC learn k-term DNF formulae properly, nevertheless, we can efficiently PAC learn k-term DNF formulae in the realizable case using k-CNF formulae.

Set covering and finding an O(log(m)) optimal approximation with a greedy algorithm.

Application of the algorithm to PAC learning conjunctions with few relevant literals by covering the negative examples (that Find-S would otherwise ignore) with the literals obtained from the solution of Find-S (that initially only takes into account the positive examples).

Assigned Reading: For the discussion on learning k-term DNF formulae using k-CNF formulae you can have a look at any one of the following sources:

However, the discussion on the approach with Set Cover and the learnability of conjunctions with few relevant variables is only available in the Kearns-Vazirani book (and in our notes).

#### Class 19 (Oct 7, 2020)

Optional Reading: A Few Useful Things to Know About Machine Learning, by Pedro Domingos.

Went over the solutions for homework 3.

Introduction to the VC-dimension.

Midterm.

#### Class 21 (Oct 12, 2020)

Assigned today: Homework 4.

Announce that a list of candidate papers for final projects are now proposed online.

Mention Occam's Razor paper by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth.

Went over the solutions for the midterm problems.

#### Class 22 (Oct 14, 2020)

Continued our discussion on the VC-dimension. What it means to prove a lower bound and what it means to prove an upper bound.

The growth function $\Pi_{\mathcal{C}}(m)$.

Discussed the VC-dimension of thresholds on a line. Difference compared to a perceptron in 1 dimension.

Discussed the VC-dimension of axis-aligned rectangles in 2 dimensions as well as in d dimensions.

#### Class 23 (Oct 16, 2020)

Discussion on the basic idea for active learning. Example with learning a threshold on a line by using binary search and thus achieving an exponential improvement on the number of labeled examples that can be used in order to accomplish the PAC criterion.

Upper bound on the VC dimension $d$ of a finite hypothesis space; that is, $d \le \lg(|\mathcal{H}|)$.

Application of the above bound so that we can compute tight bounds on the VC dimension of monotone and general conjunctions.

#### Class 24 (Oct 19, 2020)

Using the VC dimension in order to obtain a lower bound on the number of examples that are necessary for PAC learning a concept class in a distribution-free sense.

There is still some details that we need to discuss next time and actually conclude the proof.

A similar bound is given in the Kearns-Vazirani book, in Section 3.6.

#### Class 25 (Oct 21, 2020)

Due today: Homework 4.

Concluded the proof on the lower bound on the number of examples needed by any distibution-free PAC learning algorithm. Namely, we proved that any hypothesis space that has VC dimension d, implies a lower bound of $m > \frac{d-1}{64\varepsilon}$ training examples that will be needed by any distribution-free PAC learning algorithm.

Discussion on the VC dimension and what the different bounds mean, what our goals are, and further discussion between distribution-free and distribution-specific learning.

Proved that $\left(1 + \frac{1}{n}\right)^n \le e \le \left(1+\frac{1}{n}\right)^{n+1}$.

Definition of the $\Phi_d(m)$ function.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

#### Nov 25, 2020

Thanksgiving; no classes.

#### Nov 27, 2020

Thanksgiving; no classes.

TBA.

TBA.

TBA.

TBA.

TBA.

TBA.

#### Tuesday, December 15, 2020 (8:00am – 10:00am)

Final exam. In-class (Devon Energy Hall 0120 — where we met throughout the semester).