address-code-optimization.org>Offset Assignment

DSPs provide special hardware for data memory address computation. These address generation units (AGUs) support data memory access by indirect addressing in parallel to the execution of other machine operations. After accessing a memory location, the address pointer can be modified at no extra cost within a fixed range which we denote as auto­modify range. However, loading address pointers by immediate addressing require additional instruction cycles. This implies that optimizing the placement of program variables in memory allows to reduce the addressing costs.

AGU Model

DSPs include one or more special address generation units that are dedicated to address calculation. AGUs can perform operations without using the data path of the processor. This allows address calculations to take place in parallel with arithmetic operations on data.

Typical architecture of an AGU. It contains an index register file (IRF), a modify register file (MRF), and a length register file (LRF). The effective address can be computed by zero­cost operations when using the current contents of the selected registers. Loading address registers by an immediate constant "imm" require extra processor cycles.

Most AGUs provide three register files: the index register file (IRF), the modify register file (MRF), and the length register file (LRF). A specific register is selected either by a register pointer (IRP, MRP, or LRP) or a few bits in the instruction word. In general, maximum processor performance is achieved by register-­indirect addressing modes. The index registers contain the actual addresses for data memory access. When data is accessed in register-­indirect mode, the address stored in the selected index register becomes the memory address. AGUs employ a post­modify scheme, that is, an increment is added to the index register value after the address is used to access data in memory. The most common increment is plus or minus one. Often post­increment operations by plus/minus one can be carried out without initializing a register with the increment. Most proces­sors also provide the ability to add values other than ±1 to the index registers. These increment values are stored in a special group of registers, the modify registers. After an indirect data access, the increment value in the specified modify register is added to the selected index register. Some processors allow indirect addressing with indexing. Here the values stored in two address registers (index register and modify register) are added together to form the effective address to be used to access data in memory. In this addressing mode, however, the value in the index register is not modified. Often, circular buffer management is supported by modulo addressing where the selected length register contains the buffer size.

Offset Assignment

Offset assignment is the problem of finding an optimized placement of pro­gram variables in data memory for a given sequence of memory accesses. The goal is to generate a data memory layout such that the number of address offsets of two consecutive memory accesses that are outside the auto-­modify range is minimized. In general, this problem has a large solution space and can be optimally solved for small instances only.

Let V be a set of program variables. Each variable vi ∈ V is identified by index i ∈{1,2,,N} with N = |V |.

A variable access stream S = [vs(1),vs(2),,vs(M)] is defined by a function
s : {1,2,...,M } → {1,2,...,N}
(1)

where M denotes the stream length and N M. The image s() of any ∈{1,2,,M} defines the program variable vs() on position in the access stream S.

A memory layout is a permutation
π : {1,2,...,N } → {1,2,...,N}
(2)

that assigns addresses to all program variables which appear in the access stream S. Equivalently, a layout can be regarded as a string φ of nodes with each node of V appearing exactly once. The correspondence between these two definitions is simply that π(i) = k with vi ∈ V if and only if vi is the kth element of φ.

Let us assume that an address register points to vi and it should be used for accessing vj. To this end, the address pointer has to be modified by address offset π(j) -π(i). AGUs of DSPs support zero-cost address pointer updates for a limited set of offset values. Obviously, costs can be minimized by memory layout optimization. This problem is denoted as offset assignment (OA). Let cij with i,j ∈{1,2,,N} be a cost value for redirecting an address pointer from address i to the new address j. Additionally, let tijS specify how often address j is accessed right after address i in S. For a single address pointer, the address computation costs can be expressed by
     N  N
C1 = ∑  ∑  cijtS    .
     i=1 j=1    π(i)π(j)
(3)

Modifying π such that C1 becomes a minimum in Equ. 3 is a quadratic assignment problem (QAP). Since it is NP-hard, optimum solutions can be found just for small values of N.

Simple Offset Assignment

Offset assignment algorithms were originally developed for linear addressing with auto­modify range ±1. For simple offset assignment (SOA), it is assumed that the target machine has a single index register. SOA heuristics can be used as core procedures for general offset assignment (GOA) that handles multiple index registers. Several SOA heuristics are based on finding a maximum­-weighted Hamiltonian path in the access graph by using a greedy edge selection strategy.

Access graph for access sequence S = a, b, c, d, e, f, a, d, a, d, a, c, d, f, a, d and memory layout optimized for auto­modify range ±1.

General Offset Assignment

General offset assignment (GOA) is the problem of optimizing the memory layout for multiple address pointers. In case of GOA, both memory layout and address pointer assignment are optimized. Assigning address pointers can be regarded as K-coloring the access stream S. For K homogeneous address pointers, an optimum solution has to be found out of KM-1 different colorings by simple exhaustive search. A coloring of S is a partition that decomposes {1,2,,M} into K disjoint subsets. Each subset defines an access stream
Sk = [vsk(1),vsk(2),...,vsk(Mk )]
(1)

with
                                 ∑K
sk : {1,2,...,Mk } → {1,2,...,M } and Mk  = M.
                                 k=1
(2)

For each Sk, all elements appear in the same relative order as in S.

We calculate the overall address computation costs by
     ∑K  N∑  N∑
CK =           cijtSπk(i)π(j).
     k=1 i=1 j=1
(3)

The goal is to minimize CK by optimizing the memory layout π and the partitioning of S. This optimization problem consists of K interdependent QAPs with a solution space size of N!KM-1.

Optimum memory layout and index register assignment for S = b , f , e , j, e , a, h , d , g, a, f, h , e, b, a, j, c, i , c , j defined by the "red" transition matrix and "blue" transition matrix.