# diochnos/teaching/CS5713/2023F

## CS 5713 – Computational Learning Theory (Fall 2023)

### Table of Contents

### Course Description

Topics of machine learning theory. Learning using membership queries, equivalence queries, version spaces, linear models, decision trees. Probably approximately correct (PAC) learning, Occam algorithms, VC-dimension, sample sizes for distribution-independent learning. Representation issues, proper learning, reductions, intractability, learning in the realizable case, agnostic learning. We explore topics under the broader umbrella of trustworthy machine learning, such as noise models, statistical queries, and adversarially robust learning against poisoning attacks and against adversarial examples. In addition, we expland upon interpretability and explainability aspects, as well as upon fairness concerns. Other topics include distribution-specific learning and evolvability, online learning and learning with expert advice in the mistake bound model, and weak and strong learning (boosting).

[Course Description] [Table of Contents] [Top]

### Basic Information

#### Syllabus

The syllabus is available here.

#### Time and Location

Tuesdays and Thursdays, 10:30am – 11:45am, Carson Engineering Center 0119.

#### Contact Information

Please see here.

#### Office Hours

I will be holding my office hours at the following times.

- Tuesdays
- 3:00pm – 4:00pm, 230 Devon Energy Hall
- Thursdays
- 2:00pm – 3:00pm, 230 Devon Energy Hall
- Fridays
- 10:45am – 11:45am, 230 Devon Energy Hall

Please note that while anyone is welcome during my office hours (Dimitris Diochnos), students from CS 5713 will have precedence on Fridays, while students from CS 3823 will have precedence on Tuesdays and Thursdays.

##### Exceptions to the Regular Schedule of Office Hours

If you want to meet me outside of my office hours, please send me an email and arrange an appointment.

[Basic Information] [Table of Contents] [Top]

### Homework Assignments

Assignment 1: Announced on Thursday, Aug 31 (week 2). Due on Thursday, Sep 14 (week 4).

Assignment 2: Announced on Tuesday, September 19 (week 5). Due on Thursday, September 28 (week 6).

Assignment 3: Announced on Thursday, September 28 (week 6). Due on Tuesday, October 10 (week 7).

Assignment 4: Announced on Tuesday, October 10 (week 7). Due on Thursday, October 27 (week 9).

Assignment 5: Announced on TBA. Due on TBA.

[Homework Assignments] [Table of Contents] [Top]

### Course Projects

Please have a look on Canvas for the course description.

#### Candidate Material

Please have a look here for some candidate papers.

[Final Projects] [Table of Contents] [Top]

### Computational Learning Theory Resources

#### Books

The following books are very relevant to our course and are also available for free in electronic format in the following links:

- Machine Learning, by Tom Mitchell.
- An Introduction to Computational Learning Theory, by Michael J. Kearns and Umesh Vazirani
- Understanding Machine Learning, by Shai Shalev-Shwartz and Shai Ben-David.
- Foundations of Machine Learning, by Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.
- Trustworthy Machine Learning, by Kush R. Varshney.
- Interpretable Machine Learning – A Guide for Making Black Box Models Explainable, by Christoph Molnar.
- Fairness and Machine Learning – Limitations and Opportunities, by Solon Barocas, Moritz Hardt, and Arvind Narayanan.

#### Personal Notes

- Essentials on the Analysis of Randomized Algorithms
- Basic Tools and Techniques for Algorithms in Learning Theory
- Tools for Bounding Probabilities
- Learning Conjunctions with Equivalence and Membership Queries
- Upper Bound on Malicious Noise Rate for PAC Learning
- Notes on Evolution and Learning (ignore the (1+1)-EA and the related discussion in Section 8)

#### Papers

##### Assigned Readings

- Queries and Concept Learning, by Dana Angluin.
- Sections 1 to 2.4, as well as Section 6 are mandatory; rest is optional.

- Four types of noise in data for PAC learning, by Robert Sloan.
- Up to and including Section 2.2 is mandatory; rest is optional.

- Learning From Noisy Examples, by Dana Angluin and Philip Laird.
- Sections 1 – 2. Rest is optional.

- Improved Noise-Tolerant Learning and Generalized Statistical Queries, by Javed A. Aslam and Scott E. Decatur
- Sections 1 – 3. Rest is optional.
- Some related notes by Javed Aslam are available here.
- The Statistical Query model was introduced is in the paper Efficient Noise-Tolerant Learning from Statistical Queries, by Michael Kearns. Reading part or all of this paper is optional; I am mentioning it here because clearly it is very relevant.
- Below, listed as an optional reading, are the slides from the Tutorial on Statistical Queries, by Lev Reyzin. These slides are useful as a backup as well as for a quick reference in the future. These slides are now also accompanied by a short manuscript called Statistical Queries and Statistical Algorithms: Foundations and Applications that is also suggested below as an optional reading.

- On-Line Algorithms in Machine Learning, by Avrim Blum.
- On-Line Learning with an Oblivious Environment and the Power of Randomization, by Wolfgang Maass. Theorem 2.1 is mandatory. The rest of the paper is optional.

##### Optional Readings

- The Lack of A Priori Distinctions Between Learning Algorithms, by David Wolpert.
- A Few Useful Things to Know About Machine Learning, by Pedro Domingos. (alternate version)
- A Theory of the Learnable, by Leslie Valiant.
- Occam's Razor, by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth.
- Learnability and the Vapnik-Chervonenkis Dimension, by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth.
- A survey of some aspects of computational learning theory, by György Turán.
- Learning Disjunctions of Conjunctions, by Leslie Valiant.
- Learning in the Presence of Malicious Errors, by Michael Kearns and Ming Li.
- PAC Learning with nasty noise, by Nader H. Bshouty, Nadav Eiron, and Eyal Kushilevitz.
- Learning under p-tampering poisoning attacks, by Saeed Mahloujifar, Dimitrios Diochnos and Mohammad Mahmoody.
- Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution, by Dimitrios Diochnos, Saeed Mahloujifar and Mohammad Mahmoody.
- Efficient Noise-Tolerant Learning from Statistical Queries, by Michael Kearns.
- Statistical Queries and Statistical Algorithms: Foundations and Applications, by Lev Reyzin. This text is based on the Tutorial on Statistical Queries that Lev Reyzin gave in the conference on Algorithmic Learning Theory (ALT) of 2018.

[Computational Learning Theory Resources] [Table of Contents] [Top]

### Class Log

A log for the class will be held online here.

#### Week 1 (Aug 22 & Aug 24, 2023)

Discussion on syllabus and policies.

How this class is changing from previous years.

Discussion of coupon collector's problem. We derived the expected running time of the process of collecting all $N$ coupons.

Mentioned the existence of the following handouts:

- Tools for Bounding Probabilities — a simple collection of bounds that can prove to be useful on probabilistic arguments.
- Basic Tools and Techniques for Algorithms in Learning Theory — contains basic knowledge from probability theory, applications, a rather simple approach on PAC algorithms and a historical overview. Please read Section 3 and perhaps re-read it when we start discussing PAC learning later in the course.
- Essentials on the Analysis of Randomized Algorithms — perhaps the most important and related part to our course is Section 3 which also covers the coupon collector problem.

#### Week 2 (Aug 29 & Aug 31, 2023)

Discussion on basic terminology and enumeration of simple classes of functions.

Begin discussion on concept learning and version spaces.

#### Week 3 (Sep 5 & Sep 7, 2023)

Conclusion of concept learning and version spaces.

Beginning of decision trees.

#### Week 4 (Sep 12 & Sep 14, 2023)

Conclusion of decision trees.

Linear models for classification using perceptrons.

#### Week 5 (Sep 19 & Sep 21, 2023)

Introduction to PAC learning - definitions.

Assigned Reading: Started Chapter 2 from the book Foundations of Machine Learning. Equivalently, from the book Understanding Machine Learning, we started discussing the content of Chapters 2 and 3.

Axis-aligned rectangles, learning conjunctions.

Assigned Reading: This is Example 2.4 from the book Foundations of Machine Learning. Equivalently, this is the first example in Chapter 1 in the Kearns-Vazirani book.

Theorem for learning in the realizable case using a finite hypothesis space.

Efficient PAC learnability of conjunctions (using Find-S).

Assigned Reading: Section 1.3 in the Kearns-Vazirani book.

#### Week 6 (Sep 26 & Sep 28, 2023)

Discussion on version spaces, consistent learners, Occam algorithms and generalization.

Theorem for sample complexity of PAC learning using finite hypothesis spaces (in the realizable case).

Discussed some implications of the theorem that we proved last time; e.g., we improved the sample complexity of PAC learning conjunctions without even arguing about the algorithm that can be used for learning. We also saw a specific example of plugging-in certain values for ε and δ.

Assigned Reading: Sections 7.1, 7.2, and 7.3 (up until 7.3.1 but not 7.3.1 yet) from Tom Mitchell's book. This is where Mitchell is introducing PAC learning and is proving the theorem with the sample size for finite hypothesis spaces in the realizable case.

Assigned Reading: From the book Foundations of Machine Learning, Theorem 2.5 corresponds to the Theorem that we proved in class about finite hypothesis spaces in the realizable case and Example 2.6 has the discussion that we did in class for improving the sample size needed for PAC learning conjunctions using Find-S.

Optional Reading: Occam's Razor, by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. This is the paper that proved this simple but very important theorem.

Set covering and finding an O(log(m)) optimal approximation with a greedy algorithm.

Application of the algorithm to PAC learning conjunctions with few relevant literals by covering the negative examples (that Find-S would otherwise ignore) with the literals obtained from the solution of Find-S (that initially only takes into account the positive examples).

#### Sep 28, 2023

Due today: Homework 2.

Assigned today: Homework 3.

#### Week 7 (Oct 3 & Oct 5, 2023)

Discussion on Haussler's example for blowing up the set G that the Candidate-Elimination algorithm maintains.

Theorem for agnostic PAC learning using finite hypotheses spaces and empirical risk minimization.

Intractability of learning 3-term DNF formulae.

Assigned Reading: The rest of Section 7.3 from Tom Mitchell's book.

Optional Reading: This is discussed in Section 1.4 in the Kearns-Vazirani Book.

Intractability of learning 3-term DNF properly.

Mention Occam's Razor paper by Anselm Blumer, Andrej Ehrenfeucht, David Haussler, and Manfred K. Warmuth.

Optional Reading: A Few Useful Things to Know About Machine Learning, by Pedro Domingos.

Efficiently PAC learning k-term DNF formulae using k-CNF formulae. This was a reduction that allowed us to solve a problem using an algorithm (Find-S) for solving another problem (simple conjunctions). So, the representation of the hypotheses is very important in PAC learning, and whereas it is hard to PAC learn k-term DNF formulae properly, nevertheless, we can efficiently PAC learn k-term DNF formulae in the realizable case using k-CNF formulae.

Assigned Reading:

- From Tom Mitchell's book, Sections 7.3.2 and 7.3.3. For the discussion on the VC dimension see Sections 7.4 – 7.4.2 (including 7.4.2).
- From the book Foundations of Machine Learning, see Example 2.6, Example 2.8, and Example 2.9. See also Sections 3.1 and 3.2 for the discussion on the VC dimension. It also has several examples, some of which we discuss in class.
- Alternatively you can have a look at the Kearns-Vazirani book in Sections 1.4 and 1.5, regarding learning 3-term DNF formulae.

#### Week 8 (Oct 10 & Oct 12, 2023)

Introduction to the VC-dimension. What it means to prove a lower bound and what it means to prove an upper bound.

The growth function $\Pi_{\mathcal{H}}(m)$.

Discussed the VC-dimension of thresholds on a line. Difference compared to a perceptron in 1 dimension.

Discussed the VC-dimension of axis-aligned rectangles in 2 dimensions as well as in d dimensions.

Assigned Reading: From the Kearns-Vazirani book, Sections 3.1 – 3.3 (VC dimension).

Optional Reading: From the book Foundations of Machine Learning, see Sections 3.2 and 3.3 for the discussion on the VC dimension. It also has several examples, some of which we discuss in class.

We reviewed the notion of the VC dimension and other notions that are related to that (e.g., shattering, the growth function, etc).

Upper bound on the VC dimension $d$ of a finite hypothesis space; that is, $d \le \lg(|\mathcal{H}|)$.

Application of the bound on the VC dimension for finite hypotheses spaces, so that we can compute tight bounds on the VC dimension of monotone and general conjunctions.

We discussed sample complexity bounds (upper bounds and lower bounds) using the VC-dimension in the realizable case, as well as in the agnostic case.

Introduction to adversarial machine learning.

Training time attacks: Noise and poisoning attacks.

Test-time attacks: Evasion attacks (adversarial examples).

Attacks involving both phases: Backdoor and hybrid attacks.

For now we focus on noise. Discussion on different noise models.

- Bob Sloan has a nice description for several cases of noise models in Four types of noise in data for PAC learning.
- Apart from the models mentioned in Bob Sloan's paper we will mentioned the nasty noise model as a bounded-budget version of the malicious noise model. We will also make a related discussion here we will be discussing more about poisoning attacks (p-tampering being the analogue of malicious noise and strong "clean label" p-budget attacks being the analogue of nasty noise).

Assigned Reading: Four types of noise in data for PAC learning, by Robert Sloan. Up to and including Section 2.2 is mandatory reading; the rest is optional.

Optional Reading: PAC Learning with nasty noise, by Nader H. Bshouty, Nadav Eiron, and Eyal Kushilevitz.

We showed that the malicious noise that can be tolerated by PAC learning algorithms is less than $\varepsilon$, regardless of the concept class being learnt, where $\varepsilon$ is the usual parameter for bounding the risk in PAC learning. The result that we showed was a simple variant of the result of Kearns and Li from their paper Learning in the Presence of Malicious Errors using the method of induced distributions; in particular Theorem 1 from that paper.

Assigned Reading: Notes on what we discussed in class.

Optional Reading: Learning in the Presence of Malicious Errors, by Michael Kearns and Ming Li.

Optional Reading: Learning Disjunctions of Conjunctions, by Leslie Valiant. (The paper that introduced malicious noise.)

Brief discussion on some of the available papers for presentation.

Discussion on random classification noise.

We proved that $$m \ge \frac{2}{\varepsilon^2(1-2\eta)^2}\cdot\ln\left(2|\mathcal{H}|/\delta\right)$$ training examples are enough for PAC learning a concept class $\mathcal{C}$ using a hypothesis space $\mathcal{H}$, in the realizable case, under random classification noise rate $\eta$.

Of course it is a question of how we can actually obtain $\eta$ to begin with, as well as solve the computational problem that is hidden. We will discuss these in sequence.

Assigned Reading: Learning From Noisy Examples, by Dana Angluin and Philip Laird. For now please read up to (and including) Section 2.2.

#### Oct 10, 2023

Due today: Homework 3.

Assigned today: Homework 4.

#### Week 9 (Oct 17 & Oct 19, 2023)

We will discuss an algorithm for computing an upper bound $\eta_b < 1/2$ of the true noise rate $\eta$.

Assigned Reading: Section 2.3 from the paper Learning From Noisy Examples, by Dana Angluin and Philip Laird.

PAC learning under class imbalance. Situations where low risk is not the only goal; recall and precision may be very important. A bisection algorithm for computing a lower bound on the rate of the minority class.

The discussion on class imbalance is based on the paper Learning Reliable Rules under Class Imbalance

#### Week 10 (Oct 24 & Oct 26, 2023)

Continued our discussion on learning under random classification noise.

#### Oct 26, 2023

MoneyCoach presentation by Cami Sheaffer.

#### Oct 27, 2023

Due today: Homework 4.

#### Week 11 (Oct 31 & Nov 2, 2023)

Minimizing disagreements is NP-hard for conjunctions.

To the extent possible we will see the same techniques applied to situations where we have class imbalance.

#### Nov 2, 2023

Discussion of solutions to homework 4 and anything else really that will be included for the midterm.

#### Week 12 (Nov 7 & Nov 9, 2023)

#### Nov 7, 2023

Midterm

#### Nov 9, 2023

Discussion on PAC learning under class imbalance.

#### Week 13 (Nov 14 & Nov 16, 2023)

#### Nov 14, 2023

Midterms returned and discussion of solutions to midterms.

#### Nov 14, 2023

Continue our discussion on PAC learning under class imbalance.

Start discussion on adversarial machine learning.

[Class Log] [Table of Contents] [Top]