Advanced Topics in Programming Systems: Trustworthy AI Systems

FALL 2025 Edition


Artificial Intelligence (AI) has the potential to transform society and the economy. However, state-of-the-art AI systems remain shockingly unsafe and can cause significant societal and financial harm, either inadvertently or under adversarial manipulation. Despite the hype around their growing capabilities, we lack a clear, principled understanding of when, why, or how these impressive but fragile models fail. Without such a foundation, we cannot reliably prevent failures or fully unlock AI’s transformative potential in a safe and trustworthy way.

In this course, we will study recent developments at the intersection of formal methods (FM), programming languages (PL), and machine learning (ML) research to develop trustworthy AI-based systems in a principled and systematic manner. The topics covered in the course are:

  • Adversarial attacks/jailbreaks
  • Formal safety specifications and verification
  • Verification-guided training
  • Reinforcement learning with human feedback
  • Reliable explanation and interpretability methods
  • Representation engineering
  • Grammar-based constrained generation
  • Runtime monitoring with guardrails

This class is offered to students of both Urbana-Champaign and Chicago Campuses. In-person lectures will be held approximately 75% in Urbana-Champaign and 25% in Chicago, with Chicago sessions scheduled every other Thursday. Remote participants may join via Zoom (accessible only to members of the University of Illinois system).

 

News  

  • 10/31/2025: HW4 is released on Gradescope. Due on 11/07, 11:59 pm CT.

  • 10/10/2025: HW3 is released on Gradescope. Due on 10/17, 11:59 pm CT.

  • 09/26/2025: HW2 is released on Gradescope. Due on 10/03, 11:59 pm CT.

  • 09/12/2025: HW1 is released on Gradescope. Due on 09/19, 11:59 pm CT.

  • 08/21/2025: The website is up!


 Lectures:
 Every Tuesday / Thursday
 9:30 AM - 10:45 AM
 Location: 2233 Everitt Laboratory
 Forum: Campuswire

 Instructor:
 Gagandeep Singh
 Assistant Professor
 Computer Science, UIUC
 ggnds@illinois.edu
 Office hour:
 Every Thursday 11am-Noon
 SC 4214/Chicago Campus (TBD)/Zoom

 TA:
 Enyi Jiang
 Computer Science, UIUC
 enyij2@illinois.edu
 Office hour:
 Every Monday 4-5pm
 Zoom

Overview  

Prerequisites

Students should have taken CS173 (or a similar course) and know basic proofs using induction and proof by contradiction (as taught in CS173). Some background on computability theory (as taught is CS374) would be good. Having taken a course in machine learning (like CS446) would be good.

Grading Policy

We will compute the final grade using the following table:

Activity Grade Details
Homeworks 50%
  • A total of 5 homeworks involving theory and coding. Check the policy below.
Project 50%
  • Check policy below.


Homework Policy

We will use Gradescope for homework submissions. We will share the details to enroll in the course there soon.

We will drop the lowest-scoring homework for grading.

Each homework accounts for 12.5% of the grade.

Each student has a total of three additional days throughout the term to accommodate late submissions for homework assignments. This cumulative allowance can be used at your discretion across different assignments.

  • If you submit more than 5 minutes late for an assignment, then it is considered as using one full day of the late submission allowance.

  • Once you exhaust the three-day late submission allowance, any further late homework submissions will result in a penalty of 25% of the original grade for each extra late day.

  • Unless explicitly stated, use of LLMs for solving homework or cheating from others is not allowed – we will detect it! There will be an oral exam if we detect violations, during which you will be asked to answer a randomly selected question from the homework. If you are unable to show sufficient knowledge, then you will get zero for the full homework.


Project Policy

Project teams should not consist of more than 3 members.

We will specify some course projects based on the material covered in the lectures. You can also propose project, but it must be approved by the course staff.

 

Schedule  

Tentative Schedule:

  • Lecture 2-4 Adversarial Attacks

  • Lecture 5 Formal Specifications

  • Lecture 6-9 Formal Verification Of Neural Network

  • Lecture 10-13 Verification-Guided Training

  • Lecture 14-16 RLHF

  • Lecture 17-18 Reliable Explanations And Interpretability

  • Lecture 19-21 Representation Engineering

  • Lecture 22-24 Grammar-Constrained Generation

  • Lecture 25-26 Guardrail

  • Lecture 27-29 Project Presentation


For all video recordings: TBD
Date Topic Notes Related readings
08/26

Lecture 1. Course Introduction


Slides / Video
08/28

Lecture 2. Adversarial Attacks


Slides / Video
Adversarial attacks, Types of attacks, FGSM, Explanation for existence of adversarial inputs
First paper on Adversarial Examples, FGSM
09/02

Lecture 3. Adversarial Attacks (continued)


Slides / Video
Explanation for existence of adversarial inputs, Universal Adversarial Perturbations, GCG attack on LLMs
Universal Adversarial Perturbations,GCG
09/04

Lecture 4. Adversarial Attacks (continued)


Slides / Video
Jailbreak and Prompt injection on LLMs, Project Intro
AutoDAN, PAIR, DeepInception, Multi-turn Attack
09/09

Lecture 5. Specifications


Slides / Video
Formulation of specifications on neural networks and LLMs
Box Specifications, LLM Specifications, Geometric Specifications
09/11

Lecture 6. Formal Verification Of Neural Network


Slides / Video
Formal specifications for neural networks
General framework for Verification with Abstract Interpretation, Generating specifications
09/16

Lecture 7. Formal Verification Of Neural Network (continued)


Slides / Video
Zonotope
General framework for Verification with Abstract Interpretation, Box Verification
09/18

Lecture 8. Formal Verification Of Neural Network (continued)


Slides / Video
Introduction to MILP for sound and complete neural network verification.
MILP for Complete Verification, MILP+Abstract Interpretation,Multi-Neuron relaxations for ReLU, Strong Multi-Neuron MILP formulations for ReLU
09/23

Lecture 9. Formal Verification Of Neural Network (continued)


Slides / Video
DeepPoly, Probabilistic, and Relational Verifications
DeepPoly abstract domain, Verifying robustness against geometric perturbations with DeepPoly, Verifying audio classifiers with DeepPoly
09/25

Lecture 10. Formal Verification Of Neural Network (continued)


Slides / Video
DeepPoly, Probabilistic, and Relational Verifications
Relational Verifications, Verifications for VAEs
09/30

Lecture 11. Verification-Guided Training


Slides / Video
Adversarial Training
Adversarial Training, Multi-Norm Training, TRADES, RAMP, R2D2
10/02

Lecture 12. Verification-Guided Training (continued)


Slides / Video
Adversarial Training, Certified Training
CAT, LAT, IBP
10/07

Lecture 13. Verification-Guided Training (continued)


Slides / Video
Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations
IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations
10/09

Lecture 14. Verification-Guided Training (continued), RLHF


Slides / Video
Interval Back Propagation, DiffAI, COLT, SABR, UAP Certified Training, Certified Training against Geometric Perturbations
IBP, Differentiable Abstract Interpretation, COLT, SABR, UAP Certified Training, Certified Training for Geometric Perturbations
10/14

Lecture 15. RLHF


Slides / Video
PPO, Best-of-N Sampling
PPO, DPO, Best of n sampling, RLHF theory
10/16

Lecture 16. RLHF (continued)


Slides / Video
PPO, Best-of-N Sampling, DPO
PPO, DPO, Best of n sampling, RLHF theory, DPO
10/21

Lecture 17. RLHF (continued), Explanations


Slides / Video
Best-of-N Sampling, Explanations
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/23

Lecture 18. Explanations


Slides / Video
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/28

Lecture 19. Explanations (continued)


Slides / Video
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
LIME, Anchors, Saliency Maps, Integrated Gradients, Smoothed Gradients
10/30

Lecture 20. Explanations (continued), Mechanistic Interpretability


Slides / Video
Circuit Discovery, Sparse Autoencoders, Casual Abstractions
Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC
11/04

Lecture 21 Mechanistic Interpretability (continued)


Slides/ Video
Circuit Discovery, Sparse Autoencoders, Casual Abstractions
Circuit Discovery, Sparse Autoencoders, Casual Abstractions, Activation Patching, ACDC
11/06

Lecture 22 Mechanistic Interpretability (continued)


Slides/ Video
Probing, AutoInterpretability, Representation Engineering, Activation Steering
Probing, AutoInterpretability, Representation Engineering, Activation Steering
11/11

Lecture 23. Mechanistic Interpretability (continued) and Grammar Constrained Generation


Slides/ Video
Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO
Circuit Breaking, Constrained Generation for VAEs, SynCode, DINGO
11/13

Lecture 24. Grammar Constrained Generation


Slides/ Video
Constrained Generation for VAEs, SynCode, DINGO
Constrained Generation for VAEs, SynCode, DINGO, LMQL: Prompting is Programming
11/18

Lecture 24. Guradrailing


Slides/ Video
LlamaGuard, NVidia's NeMo, Guardrails AI
LlamaGuard, Nvidia's NeMo, Guardrails AI
11/20

Lecture 24. Monitoring Partial Generation and Backtracking


Slides/ Video
IterGen, CRANE
IterGen, CRANE
 

Tentative Grading Scale  

Grade A+ A A- B+ B B- C+ C C- D F
Score [95,100] [90,94] [85,89] [80,84] [75,79] [70,74] [65,69] [60,64] [55,59] [50,54] [0,49]
 

Academic Integrity  

In our commitment to academic excellence and ethical scholarship, we emphasize the importance of academic integrity in all aspects of our course. Academic integrity is central to the development of individual responsibility and scholarly credibility. Please be aware that the use of Large Language Model (LLM) tools, including ChatGPT, is strictly prohibited in the completion of any coursework and assignments in this class, unless explicitly specified otherwise. These tools, while valuable in certain contexts, do not align with our commitment to original thought and individual academic effort.

As part of this course, it is essential for all students to familiarize themselves with the University of Illinois at Urbana-Champaign Student Code, which is an integral component of our syllabus. Special attention should be paid to Article 1, Part 4, which focuses on Academic Integrity. You can access and review the Student Code at http://studentcode.illinois.edu/.