Why do we need compilers?

There are two views: the traditional view and the modern view.

Traditional View:

  • Translation: high-level language (HLL) programs to low-level machine code
  • Optimization: reduce number of arithmetic operations by optimizations like common subexpression elimination (and stuff like loop invariant elimination)
  • Ignore data structures: too complex to analyze
    • looked at as if it is a collection of scalar values

Modern View:

  • Collection of automatic techniques for extracting meaning from and transforming programs
  • Useful for. debugging, optimization, verification, detecting malware, translation, etc…
  • Optimization:
    • Restructure (organize) computation to optimize locality and parallelism
    • Reducing amount of computational is useful but not critical
    • Optimizing data structure accesses is critical
      • Now, we look at data structures as their own entities, and understand accesses to the structure

Example Optimization

Prefix sum problem. Do a divide and conquer algorithm (SCAN). It would be time instead of .

Why do we need translators?

  • Bridge the “semantic gap”
    • Programmers prefer to write programs at a high level of abstraction
  • Application portability
    • Compilers can just recompile for different ISAs

Getting Performance

Programs must exploit

  • coarse-grain (thread-level) parallelism
  • memory hierarchy
  • instruction-level parallelism (ILP)
  • registers
  • etc…

How important is it to exploit these hardware features?

  • If you have cores and only run on one, you get at most of of peak performance

Software Problem

The problem is that:

  • Programs obtained by expressing most algorithms in the straightforward way may perform poorly
  • Worrying about performance when coding algorithms complicates the software process a lot

Caches are useful only if programs have locality of reference

  • temporal locality: program references memory addresses many times within a small period
  • spatial locality: program references in address space are clustered in time (e.g., iterating sequentially over an array)

Example: Matrix Multiplication

for I = 1, n
	for J = 1, n
		for K = 1, n
			C(I,J) = C(I,J) + A(I,K) * B(K,J)

Note: this assumes the arrays are stored in row-major order

  • All six permutations of these loops are computationally equivalent
  • Great algorithmic data reuse: each array element is touched times
  • Execution times of the six can be very different if the machine has a cache (due to the locality mentioned above)

Parts of a Compiler

Front-end

Goal: convert linear representation of program to hierarchical representation

High-level optimizer

Goal: perform high-level analysis

Low-level optimizer

Code generator