Registers versus Cache In order to devise a coordinated scheme for management of registers and cache, it is first necessary to develop a better understanding of the differences and similarities between these two types of buffer memory. 2.1. Registers 2.1.1. Concepts of Registers Registers, or a "register file”, constitute a relatively small, fast, local memory residing in an address -space distinguished memory. The structure of a is given in Figure 1. from that of main register memory cell name: Figure 1. Structure of Register Memory Cell Since registers are the absolute top of the memory hierarchy (typically with cache just below), register access time is the fastest of all memory systems in a computer and there are typically fewer memory cells in a register file than there are cells in any other level of the memory hierarchy. Each register is usually one word wide, with a total of perhaps 16 or 32 words in the register file. By placing a value in a register, one can reap at least four benefits: [l] The fast access time of values in registers reduces latency. [2] A reference to a register typically does not interfere with references along the path(s) to main memory, thereby effectively increasing usable bandwidth to main memory. [3] Typically, the predictability of register references aids in compile-time optimization of code and simplifies hardware. Optimizations are aided in that reference times can be known at compile time; hardware is simplified in that register references in most machines cannot cause pipeline bubbles. [4] Because register names are typically shorter than memory addresses, referencing values in registers actually decreases the required instruction-fetch bandwidth - even though registers typically cannot hold instructions. 2.1.2. Register Allocation Register allocation - the mapping of values in a program to physical registers - is traditionally handled by the compiler. Most "good” register allocation schemes are based on one of two principles: 0 Usage count: the reference frequency of each value is used as the main criteria for allocating a register for a value. Values with higher reference frequencies should have higher priority to be in registers [Fre74]. a Graph coloring and spilling: a live-range interference graph is constructed using data flow or dependence analysis. In this graph, each node represents a value2 and each arc 2. Instead of using a node for each value, each node may represent. a variable. This is easier to implement, but degrades performance by introducing false dependencies where a variable is 345 linking two nodes represents the fact that the two values have overlapping lifetimes, i.e., are simultaneously live. A mapping of values to registers is found by coloring this graph with n colors, where n is the number of registers available (h ence, each color represents a particular register). Various heuristics have been proposed for finding an n-coloring [ChA81] [Cha82] [Cho83] [ChH84]; should the algorithm fail to find an ncoloring, some values will be placed in memory (spilled from registers) to simplify the graph so that an n-coloring can be found. Both these basic approaches share a number of characteristics: l Rather than using the program’s sequencing of references, a partial order derived from that sequence is used for allocation. 0 Register allocation is visible to the compiler and, in some cases, also to the programmer. 0 Binding of value to a name is defined at compile-time. . It is understood that defining live range in terms of values rather than in terms of variables is beneficial. 0 Conventional registers can only hold data, not instructions. 2.1.3. Limitations of Registers The most important limitation of registers, is that, for most programs, many values cannot benefit from being kept in registers. Although it is true that sometimes a value cannot be kept in a register because the hardware provided too few registers, even given an infinite number of registers, a large fraction of the values computed within any program should not be kept in registers. To understand why some values should not be kept in registers, one must understand a little bit of compiler flow analysis.3 used to hold several different values. 3. The description given here of the ambiguous alias problem is a gross oversimpliEcation intended only to give an intuitive introduction to the problem. This issue is currently one of the richest research areas within compiler technology; more detailed discussions of this problem appear in [AllS3], [Bur84], [BuC66], [AIMS], [Ste86], and [Die87]. Suppose a particular segment of a program refers to two names, one called o and the other called p. If one of o and p is a pointer, or one is a call-by-address argument to this routine and the other is a variable which was accessible in the caller’s scope, or both are elements of the same array (such as a[iJ and ahI), etc., then it is possible that even though (Y and /3 look like different names, they refer to the same object. In other words, changing the value of one might change the value of the other, i.e., Q and ~9 might be aliases for the same object. If compile-time analysis can prove that (Y and p cannot be aliases for the same object, then (Y and p can each be assigned to a register and each can be kept there indefinitely. Instead, if the compiler can prove that (Y and p are always aliases for the same object, then Q and /3 are assigned to share a single register, and again the object can be kept in a register indefinitely. However, if the compiler isn’t sure if o and p refer to the same object, or if Q and p only sometimes refer to the same object, we say that o and p are ambiguously aliased to each other. At this point, it is useful to point out that compile-time analysis techniques for determining if a! and p are aliases for each other are, at best, complex to implement and easy to confuse. Confusion results in the "safe” assumption that (Y and p are ambiguously aliased to each other. In addition, in many cases it is theoretically impossible for the compiler to determine whether o and /3 are aliased, in which case the compiler must again assume that they are ambiguously aliased. A good example of such a case is determining whether a/i] and ab] are aliased in code like Figure 2. read (i, j); a[i+j/ = a/i] + a/i]; Figure 2: Example of Compile-Time Unsolvable Aliasing Problem If the compiler’s best "guess” is that cr and p are ambiguously aliased, then placing either value in a register will require "flushing” that register whenever either (Y or /3 is stored into. This "flushing” is usually needed so often that the cost of referencing Q and p from registers is actually greater than the cost of referencing them from main memory, hence, placing Q and ,6 in registers would degrade, rather than improve, performance. 346