diochnos/teaching/CS5970/2022F

CS 5970 – Computational Learning Theory (Fall 2022)

Table of Contents

Course Description

Topics of machine learning theory. Learning using membership queries, equivalence queries, version spaces, decision trees, perceptrons. Probably approximately correct (PAC) learning, Occam algorithms, VC-dimension, sample sizes for distribution-independent learning. Representation issues, proper learning, reductions, intractability, learning in the realizable case, agnostic learning. Noise models, statistical queries, PAC learning under noise. Adversarially robust learning against poisoning attacks and against adversarial examples. Distribution-specific learning and evolvability. Online learning and learning with expert advice in the mistake bound model. Weak and strong learning (boosting).

[Course Description] [Table of Contents] [Top]

Basic Information

Syllabus

The syllabus is available here.

Time and Location

Mondays, Wednesdays, and Fridays, 9:30am – 10:20am, Carson Engineering Center 0121.

Contact Information

Please see here.

Office Hours

TBA

Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

[Basic Information] [Table of Contents] [Top]

Important Coronavirus-Related Information

We have the following links.

[Important Coronavirus-Related Information] [Table of Contents] [Top]

Homework Assignments

Assignment 1: Announced on Friday, August 26 (week 1). Due on Tuesday, September 6 (week 3).

Assignment 2: Announced on Wednesday, September 7 (week 3). Due on Saturday, September 17 (week 4).

Assignment 3: Announced on Saturday, September 17 (week 4). Due on Monday, September 26 Thursday, September 29 (week 6).

Assignment 4: TBA.

Assignment 5: TBA.

Assignment 6: TBA.

[Homework Assignments] [Table of Contents] [Top]

Final Projects

In general I am open to discussion for various ideas regarding these projects. The route that I expect most people will take, is to either work alone or form a group with someone else, read a research paper, and present what they read to others during the poster session that takes place at the end of the semester. Another idea is for 2 people to form a group and implement a library with some algorithms that we saw in class, run some experiments with these algorithms, and present this work during the poster session at the end of the semester.

For a list of candidate papers or book chapters to be presented at theend of the course, please have a look below.

I am happy to discuss final projects on other subjects that are related to the course. For example, you may want to use a data set that you acquired online for training a classifier and then predicting its generalization ability by performing cross-validation and ultimately compare how close this empirical estimate is with the prediction that you have from PAC learning (for an appropriate tradeoff between the accuracy parameter ε and the confidence parameter δ). I am also open to other suggestions; for example one can implement Shannon's mind reading machine. However, in any case, you need to talk to me well in advance and make sure that I agree with the path that you want to follow for what you plan to deliver for your final project.

Candidate Material

A list with candidate papers will be posted here.

[Final Projects] [Table of Contents] [Top]

Computational Learning Theory Resources

Books

The following books are very relevant to our course and are also available for free in electronic format in the following links:

Personal Notes

Papers

Assigned Readings
Optional Readings

[Computational Learning Theory Resources] [Table of Contents] [Top]

Class Log

A log for the class will be held online here.

Week 1

Class 1 (Aug 22, 2022)

Discussion on syllabus and policies.

Discussion of coupon collector's problem. We derived the expected running time of the process of collecting all $N$ coupons.

There are several things that we can (and we will) discuss here next time.

Class 2 (Aug 24, 2022)

Continued our discussion on coupon collection.

Mentioned the existence of the following handouts:

Class 3 (Aug 26, 2022)

Started discussion on terminology that we will be using. Agnostic vs non-agnostic learning.

Concept learning and concept class.

Hypothesis space, hypothesis.

Proper learning vs `improper'/representation-independent learning. Learning in the realizable case.

Assigned today: Homework 1.

Week 2

Class 4 (Aug 29, 2022)

Further discussion on machine learning terminology and semantics.

Discussion on the representation of Boolean functions.

Class 5 (Aug 31, 2022)

Discussion on the homework problems.

Defined the different types of queries that one can ask for query learning.

Learning monotone conjunctions using membership queries.

Class 6 (Sep 2, 2022)

Learning monotone conjunctions using equivalence queries. Learning general conjunctions using equivalence queries. Showed negative result (exponential bound in the worst case) when using membership queries.

Lemma 1 from Angluin's paper – a sunflower lemma.

Assigned Reading: From the paper Queries and Concept Learning, Sections 1 to 2.4, as well as Section 6 are mandatory; rest is optional.

The double sunflower. Using the same concept class we can achieve an exponential lower bound for any algorithm that performs exact identification using any kind of queries.

Week 3

Sep 5, 2022

No class. Labor day.

Class 7 (Sep 7, 2022)

The more-general-than-or-equal-to relation among functions. Lattice with partial ordering of hypotheses in the hypothesis space. Correspondence between hypotheses in the hypothesis space and sets of examples in the instance space.

Algorithm Find-S.

Consistency and version spaces.

Algorithm List-Then-Eliminate.

Most specific and most general boundaries.

Class 8 (Sep 9, 2022)

Due today: Homework 1.

Assigned today: Homework 2.

Version space representation theorem. (Theorem 2.1 in Mitchell's book.)

Candidate Elimination Algorithm. Execution on our toy example.

Assigned Reading: Chapter 2, from Tom Mitchell's book.

Week 4

Class 9 (Sep 12, 2022)

Using partially learned concepts to make predictions.

Inductive bias and the futility of bias-free learners.

Introduction to decision trees.

Entropy, information gain and basic idea for the construction of decision trees.

Class 10 (Sep 14, 2022)

Partial execution of the ID3 algorithm in order to identify the root node.

Characteristics of the search performed by ID3. Contrasting this search (and bias) of ID3 with the search performed by the Candidate-Elimination algorithm. Occam's razor.

Definition of overfitting. Approaches to deal with overfitting.

Assigned Reading: Chapter 3 from Tom Mitchell's book.

Class 11 (Sep 16, 2022)

Partitioning original examples to: training set, validation set, test set.

Handling overfitting: Reduced-Error Pruning and Rule Post-Pruning.

Handling continuous-valued attributes. One needs only to examine boundary points for ID3.

Discussion on missing attributes or attributes that have different costs for testing.

Sep 17, 2022

Due today: Homework 2.

Assigned today: Homework 3.

Week 5

Class 12 (Sep 19, 2022)

Perceptron and geometry of the solution (hypothesis).

The perceptron update rule for learning from linearly separable data.

Assigned Reading: Perceptrons in the beginning of Chapter 4 (Sections 4.4 – 4.4.2) from Tom Mitchell's book.

No free-lunch theorems.

Class 13 (Sep 21, 2022)

Definition of PAC learning and efficient PAC learning.

Definition of agnostic PAC learning and efficient agnostic PAC learning.

Assigned Reading: Started Chapter 2 from the book Foundations of Machine Learning. Equivalently, from the book Understanding Machine Learning, we started discussing the content of Chapters 2 and 3.

Class 14 (Sep 23, 2022)

Started our discussion on learning axis-aligned rectangles in the plane.

Completed our discussion on PAC learning axis-aligned rectangles in the plane.

Assigned Reading: This is Example 2.4 from the book Foundations of Machine Learning. Equivalently, this is the first example in Chapter 1 in the Kearns-Vazirani book.

Week 6

Class 15 (Sep 26, 2022)

Efficient PAC learnability of conjunctions (using Find-S).

Assigned Reading: Section 1.3 in the Kearns-Vazirani book.

Discussion on version spaces, consistent learners, Occam algorithms and generalization.

Theorem for sample complexity of PAC learning using finite hypothesis spaces (in the realizable case).

Discussed some implications of the theorem that we proved last time; e.g., we improved the sample complexity of PAC learning conjunctions without even arguing about the algorithm that can be used for learning. We also saw a specific example of plugging-in certain values for ε and δ.

Assigned Reading: Sections 7.1, 7.2, and 7.3 (up until 7.3.1 but not 7.3.1 yet) from Tom Mitchell's book. This is where Mitchell is introducing PAC learning and is proving the theorem with the sample size for finite hypothesis spaces in the realizable case.

Assigned Reading: From the book Foundations of Machine Learning, Theorem 2.5 corresponds to the Theorem that we proved in class about finite hypothesis spaces in the realizable case and Example 2.6 has the discussion that we did in class for improving the sample size needed for PAC learning conjunctions using Find-S.

Optional Reading: Occam's Razor, by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. This is the paper that proved this simple but very important theorem.

Class 16 (Sep 28, 2022)

Set covering and finding an O(log(m)) optimal approximation with a greedy algorithm.

Application of the algorithm to PAC learning conjunctions with few relevant literals by covering the negative examples (that Find-S would otherwise ignore) with the literals obtained from the solution of Find-S (that initially only takes into account the positive examples).

Sep 29, 2022

Due today: Homework 3.

Class 17 (Sep 30, 2022)

Discussion on Haussler's example for blowing up the set G that the Candidate-Elimination algorithm maintains.

Discussion of solutions on homework assignments (all 3) so far.

Preparation for the midterm.

Week 7

Class 18 (Oct 3, 2022)

Midterm

Class 19 (Oct 5, 2022)

Solutions of the midterm.

Oct 7, 2022

No class will take place on this day. It has been designated as the 2022 Fall Student Holiday.

Week 8

Class 20 (Oct 10, 2022)

Theorem for agnostic PAC learning using finite hypotheses spaces and empirical risk minimization.

Intractability of learning 3-term DNF formulae.

Assigned Reading: The rest of Section 7.3 from Tom Mitchell's book.

Optional Reading: This is discussed in Section 1.4 in the Kearns-Vazirani Book.

Intractability of learning 3-term DNF properly.

Assigned today: Homework 4.

Class 21 (Oct 12, 2022)

Efficiently PAC learning k-term DNF formulae using k-CNF formulae. This was a reduction that allowed us to solve a problem using an algorithm (Find-S) for solving another problem (simple conjunctions). So, the representation of the hypotheses is very important in PAC learning, and whereas it is hard to PAC learn k-term DNF formulae properly, nevertheless, we can efficiently PAC learn k-term DNF formulae in the realizable case using k-CNF formulae.

Introduction to the VC-dimension. What it means to prove a lower bound and what it means to prove an upper bound.

The growth function $\Pi_{\mathcal{H}}(m)$.

Discussed the VC-dimension of thresholds on a line. Difference compared to a perceptron in 1 dimension.

Discussed the VC-dimension of axis-aligned rectangles in 2 dimensions as well as in d dimensions.

Assigned Reading: For the discussion on learning k-term DNF formulae using k-CNF formulae you can have a look at any one of the following sources:

Optional Reading: From the Kearns-Vazirani book, Sections 1.4 and 1.5 (learning 3-term DNF formulae) as well as Sections 3.1 – 3.3 (VC dimension).

Class 22 (Oct 14, 2022)

We reviewed the notion of the VC dimension and other notions that are related to that (e.g., shattering, the growth function, etc).

Upper bound on the VC dimension $d$ of a finite hypothesis space; that is, $d \le \lg(|\mathcal{H}|)$.

Week 9

Class 23 (Oct 17, 2022)

Application of the bound on the VC dimension for finite hypotheses spaces, so that we can compute tight bounds on the VC dimension of monotone and general conjunctions.

Definition of the $\Phi_d(m)$ function.

Proved that $\left(1 + \frac{1}{n}\right)^n \le e \le \left(1+\frac{1}{n}\right)^{n+1}$.

Proved properties of the $\Phi_d(m)$ function.

Assigned Reading: From the book Foundations of Machine Learning, see Corollary 3.18.

Optional Reading: Lemma 3.2 (Section 3.4) and closing remarks in Section 3.4 of the Kearns-Vazirani book.

We stated the Sauer-Shelah lemma (i.e., that $\Pi_{\mathcal{H}}(m) \le \Phi_d(m)$), but we will prove it next time.

Class 24 (Oct 19, 2022)

Proof of the Sauer-Shelah lemma.

Assigned Reading: From the book Foundations of Machine Learning, see Theorem 3.17

Optional Reading: Sauer's lemma is proved in Lemma 3.1 (Section 3.4) of the Kearns-Vazirani book.

We discussed the big picture on the theorems regarding sample sizes that we have proved as well as what we plan to prove. We then started our discussion on the Fundamental Theorem of Computational Learning Theory (for the upper bound on the number of examples).

Class 25 (Oct 21, 2022)

Proof of the upper bound of the Fundamental Theorem of Computational Learning Theory (upper bound on the number of examples that are enough for PAC learning in the consistency model, using a hypothesis class with finite VC dimension).

We proved that, for a hypothesis $h$ (drawn from a hypothesis-space $\mathcal{H}$ that has finite VC dimension) that is consistent on a training sample $S$, it holds $$ \text{Pr}\left[\ R_{\mathcal{D}}(h, c) \le \frac{2}{m}\cdot\left(\lg\left(\Pi_{\mathcal{H}}(2m)\right) + \lg\left(\frac{2}{\delta}\right)\right) \right] \ge 1 - \delta\,. $$ (This is proved in Section 3.5 from the Kearns-Vazirani book.)

Assigned Reading: Section 28.3 from the book Understanding Machine Learning, which is the upper bound for the realizable case.

Optional Reading: Section 3.5 from the Kearns-Vazirani book.

Optional Reading: A survey of some aspects of computational learning theory, by György Turán.
The paper gives an overview of several important results that are related to computational learning theory; some of which we have already touched, such as the VC dimension, and others that we have not yet touched, such as mistake bounds in the mistake bound model of learning. You can read this paper in pieces as we are making progress with the class, but now is a good time to start digesting the notions that we have covered already in class.

Oct 23, 2022

Due today: Homework 4.

Week 10

Class 26 (Oct 24, 2022)

Assigned today: Homework 5.

Proof on the lower bound on the number of examples needed by any distibution-free PAC learning algorithm. Namely, we proved that any hypothesis space that has VC dimension d, implies a lower bound of $$m > \frac{d-1}{64\varepsilon}$$ training examples that will be needed by any distribution-free PAC learning algorithm.

A similar bound is given in the Kearns-Vazirani book, in Section 3.6.

Overview of the results that we have proved for (distribution-free) PAC learning.

Class 27 (Oct 26, 2022)

MoneyCoach presentation by Cami Sheaffer.

Class 28 (Oct 28, 2022)

Introduction to adversarial machine learning.

Training time attacks: Noise and poisoning attacks.

Test-time attacks: Evasion attacks (adversarial examples).

Attacks involving both phases: Backdoor and hybrid attacks.

For now we focus on noise. Discussion on different noise models.

Assigned Reading: Four types of noise in data for PAC learning, by Robert Sloan. Up to and including Section 2.2 is mandatory reading; the rest is optional.

Optional Reading: PAC Learning with nasty noise, by Nader H. Bshouty, Nadav Eiron, and Eyal Kushilevitz.

Week 11

Class 29 (Oct 31, 2022)

We showed that the malicious noise that can be tolerated by PAC learning algorithms is less than $\varepsilon$, regardless of the concept class being learnt, where $\varepsilon$ is the usual parameter for bounding the risk in PAC learning. The result that we showed was a simple variant of the result of Kearns and Li from their paper Learning in the Presence of Malicious Errors using the method of induced distributions; in particular Theorem 1 from that paper.

Assigned Reading: Notes on what we discussed in class.

Optional Reading: Learning in the Presence of Malicious Errors, by Michael Kearns and Ming Li.

Optional Reading: Learning Disjunctions of Conjunctions, by Leslie Valiant. (The paper that introduced malicious noise.)

Brief discussion on some of the available papers for presentation.

Class 30 (Nov 2, 2022)

Discussion on random classification noise.

We proved that $$m \ge \frac{2}{\varepsilon^2(1-2\eta)^2}\cdot\ln\left(2|\mathcal{H}|/\delta\right)$$ training examples are enough for PAC learning a concept class $\mathcal{C}$ using a hypothesis space $\mathcal{H}$, in the realizable case, under random classification noise rate $\eta$.

Of course it is a question of how we can actually obtain $\eta$ to begin with, as well as solve the computational problem that is hidden. We will discuss these in sequence.

Assigned Reading: Learning From Noisy Examples, by Dana Angluin and Philip Laird. For now please read up to (and including) Section 2.2.

Brief discussion on some of the candidate papers. We will discuss more next time.

Class 31 (Nov 4, 2022)

We will discuss an algorithm for computing an upper bound $\eta_b < 1/2$ of the true noise rate $\eta$.

Assigned Reading: Section 2.3 from the paper Learning From Noisy Examples, by Dana Angluin and Philip Laird.

Week 12

Class 32 (Nov 7, 2022)

Due today: Homework 5.

Assigned today: Homework 6.

PAC learning under class imbalance. Situations where low risk is not the only goal; recall and precision may be very important. A bisection algorithm for computing a lower bound on the rate of the minority class. Experiments on monotone conjunctions and performance guarantees.

Add a link to the SDM paper.

Class 33 (Nov 9, 2022)

Discussion about hardness results.

Minimizing disagreements is $NP$-hard for monotone conjunctions.

Introduction to statistical queries. An SQ algorithm for learning monotone conjunctions.

We need to continue our discussion here.

Class 34 (Nov 11, 2022)

Advantages of learning algorithms that are based on statistical queries.

We will derive a formula for computing $P_{\chi}$ which is the probability that the statistical query $\chi$ has to be satisfied (and will subsequently be returned within $\tau$ of its true value by the STAT oracle).

\[ P_{\chi} = Pr_{EX(c, \mathcal{D})}[\chi = 1] = \frac{(1-\eta)Pr_{EX_{CN}^\eta(c,\mathcal{D})}[\chi = 1] - \eta Pr_{EX_{CN}^\eta(\overline{c}, \mathcal{D})}[\chi=1]}{1-2\eta}\,. \]

The good thing with this last formula is that we can compute the noise-free estimate that we want based on noisy estimates.

Assigned Reading: The lecture notes by Javed Aslam.

Assigned Reading: Sections 1 – 3 from the paper Improved Noise-Tolerant Learning and Generalized Statistical Queries, by Javed A. Aslam and Scott E. Decatur. The rest of the paper is optional.

Optional Reading: The Statistical Query model was introduced in the paper Efficient Noise-Tolerant Learning from Statistical Queries, by Michael Kearns. However, reading this paper is optional.

Though we are far from exhausting the subject on SQ learning, by now we have see several ideas around PAC learning, noise, and learning with statistical queries. The slides from A Tutorial on Statistical Queries, by Lev Reyzin can prove to be useful as a backup and provide additional pointers for further reading. The slides also have a discussion on other topics connecting to SQ learning such as evolvability, differential privacy and adaptive data analysis.

Week 13

Class 35 (Nov 14, 2022)

Poisoning attacks. We used the slides that are available on Canvas.

Optional Reading: The paper Learning under p-tampering poisoning attacks, by Saeed Mahloujifar, Dimitrios Diochnos and Mohammad Mahmoody.

Class 36 (Nov 16, 2022)

Evasion attacks / adversarial examples. We used the slides that are available on Canvas.

Optional Reading: The paper Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution, by Dimitrios Diochnos, Saeed Mahloujifar and Mohammad Mahmoody.

Class 37 (Nov 18, 2022)

Concluded our discussion on adversarial machine learning.

Week 14

Class 38 (Nov 21, 2022)

Distribution-specific learning, evolvability, and the swapping algorithm.

Optional Reading: Notes on Evolution and Learning, Sections 1 – 7. We used the slides Evolvability in Learning Theory in class.

Optional Reading: The paper On Evolvability: The Swapping Algorithm, Product Distributions, and Covariance, by Diochnos and Turán.

Nov 23, 2022

Thanksgiving vacation; no classes.

Nov 25, 2022

Thanksgiving vacation; no classes.

Week 15

Class 39 (Nov 28, 2022)

Introduction to online learning and learning with expert advice.

Upper bound on the number of mistakes for the perceptron learning algorithm.

Nov 29, 2022

Due today: Homework 6.

Class 40 (Nov 30, 2022)

Halving algorithm and an upper bound on the number of mistakes. Randomized halving algorithms and expected number of mistakes.

VC-dimension lower bound on the number of mistakes of any online learning algorithm.

Weighted Majority Algorithm.

Assigned Reading: Section 7.5 from Tom Mitchell's book.

Assigned Reading: Sections 1 and 2 from Avrim Blum's paper On-Line Algorithms in Machine Learning.

Assigned Reading: Theorem 2.1 from Wofgang Maass's paper On-Line Learning with an Oblivious Environment and the Power of Randomization. The rest of the paper is optional.

Class 41 (Dec 2, 2022)

Randomized weighted majority algorithm.

Winnow.

Assigned Reading: Section 3.2 from Avrim Blum's paper On-Line Algorithms in Machine Learning.

Boosting.

Overview of the ideas for boosting the accuracy.

Introduction to boosting. Boosting the confidence and boosting the accuracy.

Overview of boosting the confidence procedure.

Optional Reading: Sections 4.1 and 4.2 from the Kearns-Vazirani book.

Optional Reading: A modest procedure for boosting the accuracy. Sections 4.3.1 and 4.3.2 from the Kearns-Vazirani book.

The above analysis is from the paper The Strength of Weak Learnability, by Robert E. Shapire which was the first paper presenting a boosting method and thus converting a weak learner to a strong learner.

Week 16

Class 42 (Dec 5, 2022)

Continue our discussion on boosting.

Class 43 (Dec 7, 2022)

A much better method for boosting the accuracy in our days is AdaBoost. A short introduction to boosting with AdaBoost is available in the paper A Short Introduction to Boosting, by Yoav Freund and Robert E. Shapire.

Also, a survey paper (book chapter) that has a large overlap with the above paper, is The Boosting Approach to Machine Learning: An Overview, by Robert E. Shapire where this time boosting and logistic regression are also discussed.

All the above-mentioned papers are well worth your time; however, they are optional readings for this class given how late in the semester we discussed boosting this year..

AdaBoost in Pseudocode.

We devoted 10 – 12 minutes for evaluating the course.

Discussion on Gather Town and presentations.

Ask me anything.

Class 44 (Dec 9, 2022)

Most likely no class will be held and instead we will have poster presentations in Gather Town.

Week 17

Tuesday, December 13, 2022 (8:00am – 10:00am)

Final exam. In-class (Carson Engineering Center 0121 — where we met throughout the semester).

[Class Log] [Table of Contents] [Top]