CS 521: Trustworthy AI Systems

Artificial Intelligence (AI) has the potential to transform society and the economy. However, state-of-the-art AI systems remain shockingly unsafe and can cause significant societal and financial harm, either inadvertently or under adversarial manipulation. Despite the hype around their growing capabilities, we lack a clear, principled understanding of when, why, or how these impressive but fragile models fail. Without such a foundation, we cannot reliably prevent failures or fully unlock AI’s transformative potential in a safe and trustworthy way.

In this course, we will study recent developments at the intersection of formal methods (FM), programming languages (PL), and machine learning (ML) research to develop trustworthy AI-based systems in a principled and systematic manner. The topics covered in the course are:

Adversarial attacks/jailbreaks
Formal safety specifications and verification
Verification-guided training
Reinforcement learning with human feedback
Reliable explanation and interpretability methods
Representation engineering
Grammar-based constrained generation
Runtime monitoring with guardrails

This class is offered to students of both Urbana-Champaign and Chicago Campuses. In-person lectures will be held approximately 75% in Urbana-Champaign and 25% in Chicago, with Chicago sessions scheduled every other Thursday. Remote participants may join via Zoom (accessible only to members of the University of Illinois system).

News

10/31/2025: HW4 is released on Gradescope. Due on 11/07, 11:59 pm CT.
10/10/2025: HW3 is released on Gradescope. Due on 10/17, 11:59 pm CT.
09/26/2025: HW2 is released on Gradescope. Due on 10/03, 11:59 pm CT.
09/12/2025: HW1 is released on Gradescope. Due on 09/19, 11:59 pm CT.
08/21/2025: The website is up!

Lectures:
Every Tuesday / Thursday
9:30 AM - 10:45 AM
Location: 2233 Everitt Laboratory
Forum: Campuswire

Instructor:
Gagandeep Singh
Assistant Professor
Computer Science, UIUC
ggnds@illinois.edu
Office hour:
Every Thursday 11am-Noon
SC 4214/Chicago Campus (TBD)/Zoom

TA:
Enyi Jiang
Computer Science, UIUC
enyij2@illinois.edu
Office hour:
Every Monday 4-5pm
Zoom

Overview

Prerequisites

Students should have taken CS173 (or a similar course) and know basic proofs using induction and proof by contradiction (as taught in CS173). Some background on computability theory (as taught is CS374) would be good. Having taken a course in machine learning (like CS446) would be good.

Grading Policy

We will compute the final grade using the following table:

Activity	Grade	Details
Homeworks	50%	A total of 5 homeworks involving theory and coding. Check the policy below.
Project	50%	Check policy below.

Homework Policy

We will use Gradescope for homework submissions. We will share the details to enroll in the course there soon.

We will drop the lowest-scoring homework for grading.

Each homework accounts for 12.5% of the grade.

Each student has a total of three additional days throughout the term to accommodate late submissions for homework assignments. This cumulative allowance can be used at your discretion across different assignments.

If you submit more than 5 minutes late for an assignment, then it is considered as using one full day of the late submission allowance.
Once you exhaust the three-day late submission allowance, any further late homework submissions will result in a penalty of 25% of the original grade for each extra late day.
Unless explicitly stated, use of LLMs for solving homework or cheating from others is not allowed – we will detect it! There will be an oral exam if we detect violations, during which you will be asked to answer a randomly selected question from the homework. If you are unable to show sufficient knowledge, then you will get zero for the full homework.

Project Policy

Project teams should not consist of more than 3 members.

We will specify some course projects based on the material covered in the lectures. You can also propose project, but it must be approved by the course staff.

Schedule

Tentative Schedule:

Lecture 2-4 Adversarial Attacks
Lecture 5 Formal Specifications
Lecture 6-9 Formal Verification Of Neural Network
Lecture 10-13 Verification-Guided Training
Lecture 14-16 RLHF
Lecture 17-18 Reliable Explanations And Interpretability
Lecture 19-21 Representation Engineering
Lecture 22-24 Grammar-Constrained Generation
Lecture 25-26 Guardrail
Lecture 27-29 Project Presentation

For all video recordings: TBD

Date	Topic	Notes	Related readings
08/26	Lecture 1. Course Introduction	Slides / Video
08/28	Lecture 2. Adversarial Attacks	Slides / Video Adversarial attacks, Types of attacks, FGSM, Explanation for existence of adversarial inputs	First paper on Adversarial Examples, FGSM
09/02	Lecture 3. Adversarial Attacks (continued)	Slides / Video Explanation for existence of adversarial inputs, Universal Adversarial Perturbations, GCG attack on LLMs	Universal Adversarial Perturbations,GCG
09/04	Lecture 4. Adversarial Attacks (continued)	Slides / Video Jailbreak and Prompt injection on LLMs, Project Intro	AutoDAN, PAIR, DeepInception, Multi-turn Attack
09/09	Lecture 5. Specifications	Slides / Video Formulation of specifications on neural networks and LLMs	Box Specifications, LLM Specifications, Geometric Specifications
09/11	Lecture 6. Formal Verification Of Neural Network	Slides / Video Formal specifications for neural networks	General framework for Verification with Abstract Interpretation, Generating specifications
09/16	Lecture 7. Formal Verification Of Neural Network (continued)	Slides / Video Zonotope	General framework for Verification with Abstract Interpretation, Box Verification
09/18	Lecture 8. Formal Verification Of Neural Network (continued)	Slides / Video Introduction to MILP for sound and complete neural network verification.	MILP for Complete Verification, MILP+Abstract Interpretation,Multi-Neuron relaxations for ReLU, Strong Multi-Neuron MILP formulations for ReLU
09/23	Lecture 9. Formal Verification Of Neural Network (continued)	Slides / Video DeepPoly, Probabilistic, and Relational Verifications	DeepPoly abstract domain, Verifying robustness against geometric perturbations with DeepPoly, Verifying audio classifiers with DeepPoly
09/25	Lecture 10. Formal Verification Of Neural Network (continued)	Slides / Video DeepPoly, Probabilistic, and Relational Verifications	Relational Verifications, Verifications for VAEs
09/30	Lecture 11. Verification-Guided Training	Slides / Video Adversarial Training	Adversarial Training, Multi-Norm Training, TRADES, RAMP, R2D2
10/02	Lecture 12. Verification-Guided Training (continued)	Slides / Video Adversarial Training, Certified Training	CAT, LAT, IBP
10/07	Lecture 13. Verification-Guided Training (continued)	Slides / Video Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations	IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations
10/09	Lecture 14. Verification-Guided Training (continued), RLHF	Slides / Video Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations	IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations
10/14	Lecture 15. RLHF	Slides / Video PPO, Best-of-N Sampling	PPO, DPO, Best of n sampling, RLHF theory
10/16	Lecture 16. RLHF (continued)	Slides / Video PPO, Best-of-N Sampling, DPO	PPO, DPO, Best of n sampling, RLHF theory, DPO
10/21	Lecture 17. RLHF (continued), Explanations	Slides / Video Best-of-N Sampling, Explanations	LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/23	Lecture 18. Explanations	Slides / Video LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients	LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/28	Lecture 19. Explanations (continued)	Slides / Video LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients	LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/30	Lecture 20. Explanations (continued), Mechanistic Interpretability	Slides / Video Circuit Discovery, Sparse Autoencoders, Casual Abstractions	Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC
11/04	Lecture 21 Mechanistic Interpretability (continued)	Slides/ Video Circuit Discovery, Sparse Autoencoders, Casual Abstractions	Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC
11/06	Lecture 22 Mechanistic Interpretability (continued)	Slides/ Video Probing, AutoInterpretability, Representation Engineering, Activation Steering	Probing, AutoInterpretability, Representation Engineering, Activation Steering
11/11	Lecture 23. Mechanistic Interpretability (continued) and Grammar Constrained Generation	Slides/ Video Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO	Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO
11/13	Lecture 24. Grammar Constrained Generation	Slides/ Video Constrained Generation for VAEs, SynCode, DINGO	Constrained Generation for VAEs, SynCode, DINGO, LMQL: Prompting is Programming
11/18	Lecture 24. Guradrailing	Slides/ Video LlamaGuard, NVidia's NeMo, Guardrails AI	LlamaGuard, Nvidia's NeMo, Guardrails AI
11/20	Lecture 24. Monitoring Partial Generation and Backtracking	Slides/ Video IterGen, CRANE	IterGen, CRANE

Tentative Grading Scale

Grade	A+	A	A-	B+	B	B-	C+	C	C-	D	F
Score	[95,100]	[90,94]	[85,89]	[80,84]	[75,79]	[70,74]	[65,69]	[60,64]	[55,59]	[50,54]	[0,49]

Academic Integrity

In our commitment to academic excellence and ethical scholarship, we emphasize the importance of academic integrity in all aspects of our course. Academic integrity is central to the development of individual responsibility and scholarly credibility. Please be aware that the use of Large Language Model (LLM) tools, including ChatGPT, is strictly prohibited in the completion of any coursework and assignments in this class, unless explicitly specified otherwise. These tools, while valuable in certain contexts, do not align with our commitment to original thought and individual academic effort.

As part of this course, it is essential for all students to familiarize themselves with the University of Illinois at Urbana-Champaign Student Code, which is an integral component of our syllabus. Special attention should be paid to Article 1, Part 4, which focuses on Academic Integrity. You can access and review the Student Code at http://studentcode.illinois.edu/.

Advanced Topics in Programming Systems: Trustworthy AI Systems

News

Overview

Prerequisites

Grading Policy

Homework Policy

Project Policy

Schedule

Lecture 1. Course Introduction

Lecture 2. Adversarial Attacks

Lecture 3. Adversarial Attacks (continued)

Lecture 4. Adversarial Attacks (continued)

Lecture 5. Specifications

Lecture 6. Formal Verification Of Neural Network

Lecture 7. Formal Verification Of Neural Network (continued)

Lecture 8. Formal Verification Of Neural Network (continued)

Lecture 9. Formal Verification Of Neural Network (continued)

Lecture 10. Formal Verification Of Neural Network (continued)

Lecture 11. Verification-Guided Training

Lecture 12. Verification-Guided Training (continued)

Lecture 13. Verification-Guided Training (continued)

Lecture 14. Verification-Guided Training (continued), RLHF

Lecture 15. RLHF

Lecture 16. RLHF (continued)

Lecture 17. RLHF (continued), Explanations

Lecture 18. Explanations

Lecture 19. Explanations (continued)

Lecture 20. Explanations (continued), Mechanistic Interpretability

Lecture 21 Mechanistic Interpretability (continued)

Lecture 22 Mechanistic Interpretability (continued)

Lecture 23. Mechanistic Interpretability (continued) and Grammar Constrained Generation

Lecture 24. Grammar Constrained Generation

Lecture 24. Guradrailing

Lecture 24. Monitoring Partial Generation and Backtracking

Tentative Grading Scale

Academic Integrity