The fast evolution and increasing complexity of computing platforms pose a major challenge for developers of high performance software for engineering, science, and consumer applications: it becomes increasingly harder to harness the available computing power. Straightforward implementations may lose as much as one or two orders of magnitude in performance. On the other hand, creating optimal implementations requires the developer to have an understanding of algorithms, capabilities and limitations of compilers, and the target platform’s architecture and microarchitecture.

This interdisciplinary course aims to give the student an understanding of performance and introduces foundations and state-of-the-art techniques in high performance software development using important functionality such as linear algebra algorithms, transforms, filters, and others as examples. The course will focus on optimizing for the memory hierarchy and special instruction sets, thus complementing courses on parallel programming. Much of the material is based on recent research.

Further, a general strategy for performance analysis and optimization is introduced that the students will apply in group projects that accompany the course. Finally, the course will introduce the students to the recent field of automatic performance tuning.

Prerequisites: solid C programming skills, matrix algebra, Master student or above

**Topics Covered**

- Algorithm analysis: Problem versus algorithm, complexity and cost (asymptotic, exact, measured), cost analysis
- Computer architecture (a software point of view): architecture and microarchitecture, memory hierarchy, special instruction sets
- Compilers: strengths, limitations, how to use
- Performance optimization: guide to benchmarking, finding hotspots, code analysis, performance optimization techniques (for memory hierarchy and vector instruction extensions); these techniques are studied using the examples in the next bullet
- Numerical functionality studied in detail (complexity, algorithms, how to write highest performance code): linear algebra kernels, transforms, filters, sparse linear algebra, others, your research project
- Automatic Performance Tuning: ATLAS, LAPACK, BeBOP, FFTW, SPIRAL, others

**Goals of this Course**

Obtain an understanding of runtime performance and how to reason about it Learn a guideline how to write fast numerical code and apply it in homeworks and your research project Understand the connection between algorithms, implementations, and computer architecture

**Background Material**

Chapters 5 and 6 in Computer Systems: A Programmer’s Perspective, 2nd edition (available in library) Research papers, manuals, and other material mentioned in the slides or linked in the table below This small tutorial Academic Integrity

All homeworks in this course are single-student homeworks. The work must be all your own. Do not copy any parts of any of the homeworks from anyone including the web. Do not look at other students’ code, papers, or exams. Do not make any parts of your homework available to anyone, and make sure noone can read your files. The university policies on academic integrity will be applied rigorously.

We will be using the Moss system to detect software plagiarism. This system is amazingly good, because it understands the programming language in question (C, in our case).

It is not considered cheating to clarify vague points in the assignments or textbook, or to give help or receive help in using the computer systems, compilers, debuggers, profilers, or other facilities.

- Teacher: Chih-Hung Liu
- Teacher: Markus Püschel
- Teacher: Gagandeep Singh
- Teacher: Daniele Giuseppe Spampinato
- Teacher: Alen Stojanov

- Teacher: Hans-Christian Estler
- Teacher: Erwin Wen Fang
- Teacher: Andreas Lochbihler
- Teacher: Ognjen Maric

- Dozent: Urs Hauser
- Dozent: Giovanni Serafini
- Dozent: Jacqueline Staub