FALL 2025 Edition
Artificial Intelligence (AI) has the potential to transform society and the economy. However, state-of-the-art AI systems remain shockingly unsafe and can cause significant societal and financial harm, either inadvertently or under adversarial manipulation. Despite the hype around their growing capabilities, we lack a clear, principled understanding of when, why, or how these impressive but fragile models fail. Without such a foundation, we cannot reliably prevent failures or fully unlock AI’s transformative potential in a safe and trustworthy way.
In this course, we will study recent developments at the intersection of formal methods (FM), programming languages (PL), and machine learning (ML) research to develop trustworthy AI-based systems in a principled and systematic manner. The topics covered in the course are:
This class is offered to students of both Urbana-Champaign and Chicago Campuses. In-person lectures will be held approximately 75% in Urbana-Champaign and 25% in Chicago, with Chicago sessions scheduled every other Thursday. Remote participants may join via Zoom (accessible only to members of the University of Illinois system).
10/31/2025: HW4 is released on Gradescope. Due on 11/07, 11:59 pm CT.
10/10/2025: HW3 is released on Gradescope. Due on 10/17, 11:59 pm CT.
09/26/2025: HW2 is released on Gradescope. Due on 10/03, 11:59 pm CT.
09/12/2025: HW1 is released on Gradescope. Due on 09/19, 11:59 pm CT.
08/21/2025: The website is up!
We will compute the final grade using the following table:
| Activity | Grade | Details |
|---|---|---|
| Homeworks | 50% |
|
| Project | 50% |
|
We will use Gradescope for homework submissions. We will share the details to enroll in the course there soon.
We will drop the lowest-scoring homework for grading.
Each homework accounts for 12.5% of the grade.
Each student has a total of three additional days throughout the term to accommodate late submissions for homework assignments. This cumulative allowance can be used at your discretion across different assignments.
If you submit more than 5 minutes late for an assignment, then it is considered as using one full day of the late submission allowance.
Once you exhaust the three-day late submission allowance, any further late homework submissions will result in a penalty of 25% of the original grade for each extra late day.
Unless explicitly stated, use of LLMs for solving homework or cheating from others is not allowed – we will detect it! There will be an oral exam if we detect violations, during which you will be asked to answer a randomly selected question from the homework. If you are unable to show sufficient knowledge, then you will get zero for the full homework.
Project teams should not consist of more than 3 members.
We will specify some course projects based on the material covered in the lectures. You can also propose project, but it must be approved by the course staff.
Tentative Schedule:
Lecture 2-4 Adversarial Attacks
Lecture 5 Formal Specifications
Lecture 6-9 Formal Verification Of Neural Network
Lecture 10-13 Verification-Guided Training
Lecture 14-16 RLHF
Lecture 17-18 Reliable Explanations And Interpretability
Lecture 19-21 Representation Engineering
Lecture 22-24 Grammar-Constrained Generation
Lecture 25-26 Guardrail
Lecture 27-29 Project Presentation
| Date | Topic | Notes | Related readings |
|---|---|---|---|
| 08/26 | Lecture 1. Course Introduction |
Slides / Video | |
| 08/28 | Lecture 2. Adversarial Attacks |
Slides / Video Adversarial attacks, Types of attacks, FGSM, Explanation for existence of adversarial inputs |
First paper on Adversarial Examples, FGSM |
| 09/02 | Lecture 3. Adversarial Attacks (continued) |
Slides / Video Explanation for existence of adversarial inputs, Universal Adversarial Perturbations, GCG attack on LLMs |
Universal Adversarial Perturbations,GCG |
| 09/04 | Lecture 4. Adversarial Attacks (continued) |
Slides / Video Jailbreak and Prompt injection on LLMs, Project Intro |
AutoDAN, PAIR, DeepInception, Multi-turn Attack |
| 09/09 | Lecture 5. Specifications |
Slides / Video Formulation of specifications on neural networks and LLMs |
Box Specifications, LLM Specifications, Geometric Specifications |
| 09/11 | Lecture 6. Formal Verification Of Neural Network |
Slides / Video Formal specifications for neural networks |
General framework for Verification with Abstract Interpretation, Generating specifications |
| 09/16 | Lecture 7. Formal Verification Of Neural Network (continued) |
Slides / Video Zonotope |
General framework for Verification with Abstract Interpretation, Box Verification | 09/18 | Lecture 8. Formal Verification Of Neural Network (continued) |
Slides / Video Introduction to MILP for sound and complete neural network verification. |
MILP for Complete Verification, MILP+Abstract Interpretation,Multi-Neuron relaxations for ReLU, Strong Multi-Neuron MILP formulations for ReLU |
| 09/23 | Lecture 9. Formal Verification Of Neural Network (continued) |
Slides / Video DeepPoly, Probabilistic, and Relational Verifications |
DeepPoly abstract domain, Verifying robustness against geometric perturbations with DeepPoly, Verifying audio classifiers with DeepPoly |
| 09/25 | Lecture 10. Formal Verification Of Neural Network (continued) |
Slides / Video DeepPoly, Probabilistic, and Relational Verifications |
Relational Verifications, Verifications for VAEs |
| 09/30 | Lecture 11. Verification-Guided Training |
Slides / Video Adversarial Training |
Adversarial Training, Multi-Norm Training, TRADES, RAMP, R2D2 |
| 10/02 | Lecture 12. Verification-Guided Training (continued) |
Slides / Video Adversarial Training, Certified Training |
CAT, LAT, IBP |
| 10/07 | Lecture 13. Verification-Guided Training (continued) |
Slides / Video Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations |
IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations |
| 10/09 | Lecture 14. Verification-Guided Training (continued), RLHF |
Slides / Video Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations |
IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations |
| 10/14 | Lecture 15. RLHF |
Slides / Video PPO, Best-of-N Sampling |
PPO, DPO, Best of n sampling, RLHF theory |
| 10/16 | Lecture 16. RLHF (continued) |
Slides / Video PPO, Best-of-N Sampling, DPO |
PPO, DPO, Best of n sampling, RLHF theory, DPO |
| 10/21 | Lecture 17. RLHF (continued), Explanations |
Slides / Video Best-of-N Sampling, Explanations |
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients |
| 10/23 | Lecture 18. Explanations |
Slides / Video LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients |
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients |
| 10/28 | Lecture 19. Explanations (continued) |
Slides / Video LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients |
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients |
| 10/30 | Lecture 20. Explanations (continued), Mechanistic Interpretability |
Slides / Video Circuit Discovery, Sparse Autoencoders, Casual Abstractions |
Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC |
| 11/04 | Lecture 21 Mechanistic Interpretability (continued) |
Slides/ Video Circuit Discovery, Sparse Autoencoders, Casual Abstractions |
Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC |
| 11/06 | Lecture 22 Mechanistic Interpretability (continued) |
Slides/ Video Probing, AutoInterpretability, Representation Engineering, Activation Steering |
Probing, AutoInterpretability, Representation Engineering, Activation Steering |
| 11/11 | Lecture 23. Mechanistic Interpretability (continued) and Grammar Constrained Generation |
Slides/ Video Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO |
Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO |
| 11/13 | Lecture 24. Grammar Constrained Generation |
Slides/ Video Constrained Generation for VAEs, SynCode, DINGO |
Constrained Generation for VAEs, SynCode, DINGO, LMQL: Prompting is Programming |
| 11/18 | Lecture 24. Guradrailing |
Slides/ Video LlamaGuard, NVidia's NeMo, Guardrails AI |
LlamaGuard, Nvidia's NeMo, Guardrails AI |
| 11/20 | Lecture 24. Monitoring Partial Generation and Backtracking |
Slides/ Video IterGen, CRANE |
IterGen, CRANE |
| Grade | A+ | A | A- | B+ | B | B- | C+ | C | C- | D | F |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | [95,100] | [90,94] | [85,89] | [80,84] | [75,79] | [70,74] | [65,69] | [60,64] | [55,59] | [50,54] | [0,49] |
In our commitment to academic excellence and ethical scholarship, we emphasize the importance of academic integrity in all aspects of our course. Academic integrity is central to the development of individual responsibility and scholarly credibility. Please be aware that the use of Large Language Model (LLM) tools, including ChatGPT, is strictly prohibited in the completion of any coursework and assignments in this class, unless explicitly specified otherwise. These tools, while valuable in certain contexts, do not align with our commitment to original thought and individual academic effort.
As part of this course, it is essential for all students to familiarize themselves with the University of Illinois at Urbana-Champaign Student Code, which is an integral component of our syllabus. Special attention should be paid to Article 1, Part 4, which focuses on Academic Integrity. You can access and review the Student Code at http://studentcode.illinois.edu/.