How can we learn from sensitive data collected from individuals, while protecting the privacy of those individuals?
This question is central to the study of data privacy, and is increasingly relevant with the widespread collection of our personal data. Analysis of this data can lead to important benefits for society, including advances in medicine and public infrastructure, but can also result in privacy breaches that expose our most closely-held secrets.
This course will explore both threats to privacy and solutions to the data privacy problem. We will demonstrate that traditional approaches to protecting privacy, such as anonymization, are subject to powerful attacks that reveal individuals’ sensitive data. We will see that while more recent approaches for protecting privacy, including k-anonymity and l-diversity, are more resistant to these attacks, they are not immune.
Then, we will explore recent formal notions of privacy, including differential privacy. Differential privacy provides a rigorous formal definition of individual privacy that enables a wide range of statistical analyses while protecting privacy. We will explore a number of differentially private algorithms for analytics and machine learning, and learn about the algorithmic building blocks and proof techniques used to develop them.
In addition to learning about the mathematical foundations of differential privacy, we will explore its practical implications. We will learn about existing practical systems for enforcing differential privacy and examine the challenges of building such systems. This course will include programming assignments and an end-of-semester project, in which students are expected to demonstrate both mastery of the concepts we explore and understanding of their practical implications by building their own systems that perform privacy-preserving analyses on real data.
By the end of this course, you will be able to:
Please do not buy any books for this course. All required reference material is available online for free.
The primary textbook we will use for this course is:
The following resources may also be useful for additional reading:
[D&R] The Algorithmic Foundations of Differential Privacy
Cynthia Dwork and Aaron Roth.
[Nissim] Differential Privacy: A Primer for a Non-technical Audience
Kobbi Nissim, Thomas Steinke, Alexandra Wood, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, David R. O’Brien, and Salil Vadhan.
In addition to these, we will reference a number of academic papers throughout the semester (especially for the section on privacy-preserving machine learning).
Your grade for the course will be determined as follows:
Your final grade will be determined by summing the total number of points awarded and calculating the percentage of the total possible points. This percentage is translated into a letter grade as follows:
There will be two exams: a midterm and a final. You will be allowed unlimited notes for each exam (but please don’t print a whole book). See the schedule below for the dates.
This course will use Python for examples and for programming assignments. Students are expected to be proficient in Python programming. Programming assignments will be distributed and turned in as Jupyter notebooks. Click here for instructions on installing Jupyter Notebook.
Assignment Submission: Homework and in-class exercises will be turned in via Brightspace.
To submit an assignment:
Please do not change the name of the .ipynb file. This makes the grading process more difficult.
Please let me know if you have any questions about the submission process.
100% - Correct or with minor issues
75% - Main idea on the right path, with parts incorrect
50% - Decent start, but misses the main idea
0% - Missing/no answer
Solutions and feedback: Homework solutions will be posted on Brightspace under “homework solutions.” Grades will be posted on Brightspace. To see your graded assignment, visit the following link:
<your-netid-here> with your actual netid. You will need to
log in using your UVM credentials to view your graded assignments. If
you have questions about how a question was graded, or if you spot a
mistake in grading, please let me know.
Late work may be accepted, but you must make arrangements with me first. If you need to turn something in late, for any reason, please email me before the deadline. Depending on the circumstances, I may (or may not) impose a late penalty on your grade.
Collaboration on the high-level ideas and approach on assignments is encouraged. Copying someone else’s work is not allowed. Any collaboration, even at a high level, must be declared when you submit your assignment, in a note at the top of the assignment. E.g., “I discussed high-level strategies for solving problem 2 and 5 with Alex.”
The official references for the course are listed in the schedule below. Copying from references other than these is not allowed. In particular, code and proofs should not be copied from other sources, including Stack Overflow and other public sources.
Students caught copying work are eligible for immediate failure of the course and disciplinary action by the University. All academic integrity misconduct will be treated according to UVM’s Code of Academic Integrity.
The course will include a final project, completed in groups of 1-3 students. The final project will demonstrate your mastery of the concepts covered in this course by implementing a practical system to perform privacy-preserving analysis of realistic data.
Click here for more complete information.
We will not hold class on Friday, September 15. I encourage you to attend CS Student Research Day and learn about the awesome research being done by CS students at UVM!
Note that class will not be held on the following dates:
Note that class will be asynchronous on the following dates:
Important due dates:
|Final project writeup/video/implementation
Schedule of topics:
|Intro to data privacy; de-identification; re-identification (no exercise)
|k-Anonymity and l-Diversity (no class Monday)
|Intro to differential privacy; Laplace mechanism (no class Friday)
|Sensitivity; post-processing; composition & privacy budget; unit of privacy
|Ch. 4, 5
|Clipping; approximate DP; Advanced composition; Gaussian mechanism
|Local sensitivity; propose-test-release, smooth sensitivity, sample-and-aggregate
|Intermission. Review (exam Wednesday; no class Friday; no exercise)
|Recent variants of differential privacy
|Exponential mechanism; sparse vector technique
|Ch. 9, 10
|Privacy-preserving machine learning; differentially private SGD
|Local differential privacy
|Differentially private synthetic data
|No class (Thanksgiving)
|Privacy in deep learning; Practical systems for privacy
|Open challenges; review
In keeping with University policy, any student with a documented disability interested in utilizing accommodations should contact SAS, the office of Disability Services on campus. SAS works with students and faculty in an interactive process to explore reasonable and appropriate accommodations, which are communicated to faculty in an accommodation letter. All students are strongly encouraged to meet with their faculty to discuss the accommodations they plan to use in each course. A student’s accommodation letter lists those accommodations that will not be implemented until the student meets with their faculty to create a plan. Contact SAS: A170 Living/Learning Center; 802-656-7753; firstname.lastname@example.org; or www.uvm.edu/access
Students have the right to practice the religion of their choice. Each semester students should submit in writing to their instructors by the end of the second full week of classes their documented religious holiday schedule for the semester. An arrangement can then be made to make up the missed work.
In order to be excused from classes, student athletes should submit appropriate documentation to the Professor in advance of all scheduling conflicts within the first two weeks of class. Those missing class are expected to submit make-up assignments within a reasonable time period.