Algorithms for big data

Instructor: Hossein Jowhari
Semester: Spring 2021 (99-2)
Time: SAT, MON 10:30-12:00
Textbook: No textbook.
Evaluation: Homeworks(30%), Final(30%), Student Presentation(40%).

Lectures :

Lecture Subject Material References
0 Introduction to the course slides -
1 Basics of algorithm analysis slides -
2 Sublinear time algorithms, estimating the average degree slides U Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph.
3 Concentration bounds (Chebyshev and Chernoff) slides See Nick Harvey's notes on Chernoff bound. Benjamin Doerr, Probabilistic Tools.
4 Estimating the average degree, continued slides A. Czumaj and C. Sohler, sublinear time algorithms (survey)
5 Estimating the number of connected components slides B. Chazelle, R. Rubinfeld, and L. Trevisan. approximating the minimum spanning tree weight in sublinear time. 2005.
6 Approximate median, k-median clustering slides -
7 k-median clustering via sampling slides Mishra, Oblinger, Pitt. kmedian clustering in sublinear time.
8 Estimating metric space properties via sampling slides P. Indyk. Sublinear time algorithms for metric space problems.
- Happy Nowruz! - -
9 Dimensionality reduction: Johnson-Lindenstrauss lemma slides Johnson, Lindenstrauss. extensions of Lipschitz mappings into a Hilbert space, 1984. See also David Witmer's notes for missing details.
10 Data stream model: frequency moments slides Alon, Matias, Szegedy. Space complexity of approximating the frequency moments.
11 k-wise independence, distinct elements slides See Bernhard Haeupler's notes on k-wise independence. Alon, Matias, Szegedy's paper.
12 Algorithms for heavy hitters slides J. Misra and D. Gries. Finding repeated elements. 1982.
13 CountMin data structure slides Cormode and Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications.
14 Sketching, Mergeable summaries slides -
15 Sparse recovery, L0 sampling slides Cormode and Jowhari. Lp sampling and its applications: a survey. 2019.
16 Graph sketches, dynamic spanning forest slides Ahn, Guha and McGregor. Analyzing graph structure via linear measurements. 2012
17 Space lower bounds slides -
18 Parallel computing and big data slides See Alex Andoni's notes on algorithms for massive data. Mohsen Ghaffari, massively parallel algorithms.
19 Massively Parallel Computing, maximal matching slides Mohsen Ghaffari, massively parallel algorithms.
20 Massively Parallel Computing, sorting slides See Alex Andoni's notes on algorithms for massive data.
21 Three practice problems - -
22 Solutions for practice problems - -
23 Subspace embedding and least square regression slides D. Woodruff. sketching as a tool for numerical linear algebra

Similar courses in other universities :

  • Grigory Yaroslavtsev's course at Indiana University
  • Jelani Nelson's course at Harvard University
  • Amit Chakrabarti's course at Dartmouth College
  • Andre McGregor's course at UMass Amherst
  • Sergei Vassilvitskii's course at Columbia University