## GVPHarmonyOS / OpenArkCompilerC++

Explore and code with more than 2 million developers，Free private repositories ！：）

• C++ 89.0%
• C 10.6%
• Makefile 0.3%
• Other 0.1%
MapleIRDesign.md 81.33 KB
cloudflywu authored 2019-09-03 20:37 . upadte MapleIRDesign.md

[TOC]

# Introduction

MAPLE IR is an internal program representation to support program compilation and execution.  Because any information in the source program may be useful for program analysis and optimization, MAPLE IR aims to provide information about the source program that is as complete as possible.

Program information consists of two parts: declaration part for defining the program constructs and the executable program code.  The former is commonly referred to as the symbol table.

MAPLE IR is target-independent.  It is not pre-disposed towards any specific target processor or processor characteristic.

MAPLE IR can be regarded as the ISA of a virtual machine (VM).  The MAPLE VM can be regarded as a general purpose processor that takes MAPLE IR as input and directly executes the program portion of the MAPLE IR.  MAPLE VM can be regarded as the first consumer of MAPLE IR.  A program compiled into MAPLE IR can be executed on MAPLE VM without the need to finish its compilation to the machine instructions of any target processor.

MAPLE IR is the common representation for programs compiled from different programming languages, which include general purpose languages like C, C++ and Java.  MAPLE IR is extensible.  As additional languages, including domain-specific languages, are compiled into MAPLE IR, more constructs will be added to represent constructs unique to each language.

MAPLE IR is capable of supporting all known program analyses and optimizations, owing to its flexibility of being able to represent program code at different semantic levels.   MAPLE IR's program representation at the highest level exhibits the following characteristics:

• Many language constructs

• Short code sequences

• Constructs are hierarchical

• No loss of program information

Language-specific analyses and optimizations are best performed at the high level.  As compilation proceeds, MAPLE IR is gradually lowered so that the granularity of its operations corresponds more closely to the instructions of a general purpose processor.  At the lower levels, general purpose optimizations are performed.  In particular, at the lowest level, MAPLE IR instructions map one-to-one to machine instructions most of the time, for the mainstream processor ISAs.  This is important to maximize the effects of optimizations at the IR level, as each eliminated operation will have the corresponding effect on the target machine.  At the lowest level, all operations, including type conversion operations, are explicitly expressed at the IR level.

MAPLE IR represents program code intrinsically in the form of trees.  At the highest level, it honors the hierarchical form of the program as it exists at the source level via the tree representation.  It also honors the abstract operations defined by the language.  As compilation proceeds, the abstract operations are lowered into general-purpose operations that require longer code sequences.  The program structure also becomes more flat, as general-purpose processors work by executing lists of instructions sequentially.

Since MAPLE IR is one IR that can exist at multiple levels of semantics, the level of a MAPLE IR program is dictated by the constraints that it adheres to.  These constraints are of the following two types:

• Opcodes allowed - The higher the level, the more types of opcodes allowed, including opcodes generated only from specific languages.  At the lowest level, only opcodes that correspond one-to-one to operations in a general purpose processor are allowed.

• Code structure - The program structure is hierarchical at the higher levels.  The hierarchical constructs become less and less as lowering proceeds.  At the lowest level, the program structure is flat, consisting of sequences of primitive instructions consumed by the general purpose processor.

Though MAPLE IR is target-independent at the highest level, the lowering process will make it become target-dependent.

# Program Representation

There are additional design criteria at the implementation level.  To be friendly to compiler developers, MAPLE IR exists in both binary and ASCII forms.  Conversion between the two forms can be viewed as assembly and dis-assembly.  Thus, the ASCII form can be viewed as the assembler language of the MAPLE VM instructions.  The ASCII form is editable, which implies that it is possible to program in MAPLE IR directly.  Thus, the ASCII form of MAPLE IR is modeled after a typical C program, which is made up of:

• declaration statements - these represent the symbol table information

• executable statements - these represent the executable program code

But an ASCII MAPLE IR is not intended to obey C programming syntax.

The language front-end compiles a source file into a MAPLE IR file.  Thus, each MAPLE IR file corresponds to a CU (compilation unit).  A CU is made up of declarations at the global scope.  Among the declarations are functions, also known as PUs (program units).  Inside PUs are declarations at the local scope followed by the executable code of the function.

There are three kinds of executable nodes in MAPLE IR:

• Leaf nodes - Also called terminal nodes, these nodes denote a value at execution time, which may be a constant or the value of a storage unit.
• Expression nodes - An expression node performs an operation on its operands to compute a result. Its result is a function of the values of its operands and nothing else. Each operand can be either a leaf node or another expression node. Expression nodes are the internal nodes of expression trees. The type field in the expression node gives the type associated with the result of the operation.
• Statement nodes - These represent the flow of control. Execution starts at the entry of the function and continues sequentially statement by statement until a control flow statement is executed. Apart from modifying control flow, statements can also modify data storage in the program. A statement nodes has operands that can be leaf, expression or statement nodes.

In all the executable nodes, the opcode field specifies the operation of the node, followed by additional field specification relevant to the opcode.  The operands for the node are specified inside parentheses separated commas.  The general form is:

opcode fields (opnd0, opnd1, opnd2)

For example, the C statement "a = b" is specified using the direct assignment opcode dassign that assigns the rhs operand b  to a .

dassign $a (dread i32$b)

To enable easy visualization in the ASCII IR, whenever an operand is not a leaf node, we require the start of a new line indented to the right by two spaces relative to the last line.  Thus, the statement "a = b + c" is:

dassign $a ( add i32(dread i32$b, dread i32 $c)) and the statement "a = b + c - d" is: dassign$a (
sub i32(
add i32(dread i32 $b, dread i32$c),

## Pseudo-registers

Pseudo-registers can be regarded as local variables of a primitive type whose addresses are never taken.  They can be declared implicitly by their appearances.  The primitive type associated with a pseudo-register is sticky. With the exception that integer and address types of the same size can be associated with a pseudo-register.

Because pseudo-registers can only be created to store primitive types, the use of field IDs does not apply to them. Pseudo-registers are referred to by the '%' prefix followed by a number. This distinguishes them from other local variables that are not pseudo-registers, as their names cannot start with a number.

The compiler will promote variables to pseudo-registers.  To avoid the loss of high level type information when a variable is promoted to pseudo-registers, reg declarations are used to provide the type information associated with the pseudo-registers.

## Special Registers

Special registers are registers with special meaning.  They are all specified using %% as prefix.  %%SP is the stack pointer and %%FP the frame pointer in referencing the stack frame of the current function.  %%GP is the global pointer used for addressing global variables.

Special registers %%retval0, %%retval1, %%retval2, etc. are used for fetching the multiple values returned by a call.  They are overwritten by each call, and should only be read at most once after each call.  They can assume whatever is the type of the return value.

## Statement Labels

Label names are prefixed with '@' which serves to identify them.  Any statement beginning with a label name defines that label as referring to that text position.  Labels are only referred to locally by goto and branch statements.

## Storage Accesses

Since MAPLE IR is target-independent, it does not exhibit any pre-disposition as to how storage are allocated for the program variables.  It only applies rules defined by the language regarding storage.

In general, there are two distinct kinds of storage accesses: direct and indirect.  Direct accesses do not require any run-time computation to determine the exact address location.  Indirect accesses require address arithmetic before the location can be determined.  Indirect accesses are associated with pointer dereferences and arrays.  Direct accesses are associated with scalar variables and fixed fields inside structures.

Direct accesses can be mapped to pseudo-register if the variable or field has no alias.  Indirect accesses cannot be mapped to pseudo-registers unless the computed address does not change.  But since indirect accesses may represent array and matrix elements, there are many optimizations relevant only to indirect storage accesses.

In MAPLE IR, dassign and dreadare the opcodes for direct assignments and direct references; iassign and iread are the opcodes for indirect assignments and indirect references.

## Aggregates

Aggregates (or composites) are either structures or arrays.  They both designate a grouping of storage elements.  In structures, the storage elements are designated by field names and can be of different types and sizes. In this document, the structure designation includes classes and objects in object-oriented programming languages.  In arrays, the same storage element is repeated a number of times and the elements are accessed via index (or subscript).

Arrays

Array subscripting designate address computation.  Since making the subscripts stand out facilitate data dependency analysis and other loop nest optimizations, MAPLE IR represents array subscripting using the special array opcode, which returns the address resulting from the subscripting operation.  For example, "a[i] = i" is:

iassign<*i32>(
array a32 <* [10] i32> (addrof a32 $a, dread i32$i),              # <* [10] i32> indicates pointer to an array of 10 ints
dread i32 $i) and "x = a[i,j]" is: dassign$x (
iread i32 <* i32>(

var x extern f32 volatile static ## Pseudo-register Declarations syntax: reg <preg-name> <type> The keyword 'reg' designates a declaration statement for a pseudo-register. <preg-name> specifies the pseudo-register prefixed by '%'. <type> is the high level type information. If a pseudo-register is only of primitive type, its declaration is optional. ## Type Specification Types are either primitive or derived. Derived types are specified using C-like tokens, except that the specification is always right-associative, following a strictly left to right order. Derived types are distinguished from primitive types by being enclosed in angular brackets. Derived types can also be thought of as high-level types. Examples: var %p <* i32> # pointer to a 32-bit integer var %a <[10] i32> # an array of 10 32-bit integers var %foo <func(i32) i32> # a pointer to a function that takes one i32 parameter and returns an i32 value (func is a keyword) Additional nested angular brackets are not required, as there is no ambiguity due to the right-associative rule. But the nested angular brackets can be optionally inserted to aid readability. Thus, the following two examples are equivalent: var %q <* <[10] i32>> # pointer to an array of 10 32-bit integers var %q <* [10] i32> # pointer to an array of 10 32-bit integers Inside a struct declaration, field names are prefixed by @. Though label names also use @ as prefix, there is no ambiguity due to the usage context between struct field declaration and label usage being distinct. Example: var %s <struct{@f1 i32, @f2 f64, @f3:3 i32}> # a bitfield of 3 bits A union declaration has the same syntax as struct. In a union, all the fields overlap with each other. The last field of a struct can be a flexible array member along the line of the C99 standard, which is an array with variable number of elements. It is specified with empty square brackets, as in: var %p <* struct{@f1 i32, @f2 <[] u16>}> a flexible array member with unsigned 16-bit integers as elements Structs with flexible array member as its last field can only be dynamically allocated. Its actual size is fixed only at its allocation during execution time, and cannot change during its life time. A struct with flexible array member cannot be nested inside another aggregate. During compilation, the flexible array member is regarded as having size zero. Because its use is usually associated with managed runtime, a language processor may introduce additional meta-data associated with the array. In particular, there must be some language-dependent scheme to keep track of the size of the array during execution time. When a type needs to be qualified by additional attributes for const, volatile, restrict and various alignments, they follow the type that they qualify. These attributes are not regarded as part of the type. If these attributes are applied to a derived type, they must follow outside the type angular brackets. Examples: var %x f64 volatile align(16) # %s is a f64 value that is volatile and aligned on 16 byte boundary var %p <* f32> const volatile # %p is a pointer to a f32 value, and %p itself is const and volatile Alignments are specified in units of bytes and must be power of 2. Alignment attributes must only be used to increase the natural alignments of the types, to make the alignments more stringent. For decreasing alignments, the generator of MAPLE IR must use smaller types to achieve the effect of packing instead of relying on the align attribute. ## Incomplete Type Specification Languages like Java allow contents of any object to be referenced without full definition of the object being available. Their full contents are to be resolved from additional input files in later compilation phases. MAPLE IR allows structs, classes and interfaces to be declared incompletely so their specific contents can be referenced. Instead of the structclass and interface keywords, structincompleteclassincomplete and interfaceincomplete should be used instead. ## Type Declaration syntax: type <id-name> <derived-type> Type names are also prefixed with either '' or '%' based on whether the scope is global or local.  Example:

type $i32ptr <* i32> # the type$i32ptr is a pointer to i32

Primitive types are not allowed to be given a different type name.

Attributes are not allowed in type declaration.

Once a type name is defined, specifying the type name is equivalent to specifying the derived type that it stands for.  Thus, the use of a type name should always be enclosed in angular brackets.

## Java Class and Interface Declaration

A Java class designates more than a type, because the class name also carries the attributes declared for the class.  Thus, we support declaration of Java classes via:

syntax: javaclass <id-name> <class-type> <attributes>

<id-name> must have '$' as prefix as class names always have global scope. For example: javaclass$Puppy <class [{@color](mailto:%7B@color) i32}> public final         # a java class named "Puppy" with a single field "color" and attributes public and final

A javaclass name should not be regarded as a type name as it contains additional attribute information.  It cannot be enclosed in angular brackets as it cannot be referred to as a type.

A java interface has the same form as the class type, being able to extend another interface, but unlike class, an interface can extend multiple interfaces.  Another difference from class is  that an interface cannot be instantiated.  Without instantiation, the data fields in interfaces are always allocated statically.   For example,

interface <$interfaceA> { #this interface extends interfaceA @s1 int32, # data fields inside interfaces are always statically allocated &method1(int32) f32 } # a method declaration MAPLE IR handles an interface declaration as a type declaration. Thus, the above can be specified after the type keyword to be associated with a type name. Separately, the javainterface keyword declares the symbol associated with the interface: syntax: javainterface <id-name> <interface-type> <atteributes> <id-name> must have '$' as prefix as interface names always have global scope.  For example:

javainterface $IFA <interface{&amethod(void) int32}> public static #$IFA is an interface with a single method &amethod

Again, a javainterface name should not be regarded as a type name, as it is a symbol name.  When a class implements an interface, it specifies the javainterface name as part of its comma-separated contents, as in:

class <$PP> { &amethod(void) int32, # this class extends the$PP class, and &amethod is a member function of this class

$IFA } # this class implements the$IFA interface

## Function Declaration

syntax:

 func <func-name> <attributes> (

var <parm0> <type>,

. . .

var <parmn> <type>)  <ret-type0>,  . . .
<ret_typen> {

<stmt0>

. . .

<stmtn> }

<attributes> provides various attributes about the function, like static,extern, etc. The opening parentheses starts the parameter declarations, which can be empty.  Each <parmi> is of the form of a var or reg declaration declaring each incoming parameter.  If the last parameter is specified as "...", it indicates the start of the variable part of the arguments as in C.  Following the parameter declarations is a list of the multiple return types separated by commas. If there is no return value, <ret-type0> should specify void.  Each <ret-typei> can be either primitive or derived. If  no statement follows the parentheses, then it is just a prototype declaration.

funcsize

syntax: funcsize <size-in-bytes>

This directive appears inside the function block to give the Maple IR code size of the function.

framesize

syntax: framesize <size-in-bytes>

This directive appears inside the function block to give the stack frame size of the function.

moduleid

syntax: moduleid <id>

This directive appears inside the function to give the unique id of the module which the function belongs to.

upformalsize

syntax: upformalsize <size-in-bytes>

This directive appears inside the function block to give the size of upformal segment that stores the formal parameters being passed above the frame pointer %%FP.

formalwordstypetagged

syntax: formalwordstypetagged = [ <word-values> ]

This specifies a bitvector initialized to the value specified by the list of space-separated 32-bit integer constants.  The Nth bit in this bitvector is set to 1 if the Nth word in the upformal segment has a type tag, in which case the type tag is at the (N+1)th word.

formalwordsrefcounted

syntax: formalwordsrefcounted = [ <word-values> ]

This specifies a bitvector initialized to the value specified by the list of space-separated 32-bit integer constants.  The Nth bit in this bitvector is set to 1 if the Nth word in the upformal segment is a pointer to a reference-counted dynamically allocated memory block.

localwordstypetagged

syntax: localwordstypetagged = [ <word-values> ]

This specifies a bitvector initialized to the value specified by the list of space-separated 32-bit integer constants.  The Nth bit in this bitvector is set to 1 if the -Nth word in the local stack frame has a type tag, in which case the type tag is at the (-N+1)th word.

localwordsrefcounted

syntax: localwordsrefcounted = [ <word-values> ]

This specifies a bitvector initialized to the value specified by the list of space-separated 32-bit integer constants.  The Nth bit in this bitvector is set to 1 if the -Nth word in the local stack frame is a pointer to a reference-counted dynamically allocated memory block.

## Initializations

When there are initializations associated with a var declaration, there is '=' after the var declaration, followed by the initialization value.  For aggregates, the list of initialization values are enclosed by brackets '[' and ']', with the values separated by commas.  In arrays, the initialization values for the array elements are listed one by one, and nested brackets  must be used to correspond to elements in each lower-order dimension.

In specifying the initializations for a struct, inside the brackets, field-ID followed by '=' must be used to specify the value for each field explicitly. The fields' initialization values can be listed in arbitrary order.  For nested structs, the usage of nested brackets is optional, because field-IDs at the top level can uniquely specify fields in nested structs.  But if nested brackets are used, then field-ID usage within the nested brackets is relative to that corresponding level of sub-struct.  Example:

type %SS <struct { @g1 f64, @g2 f64 }>
var %s [struct{@f1i32,
@f2 <%SS>,
@f3:4 i32} = [ 1 = 99,
2 = [1= 10.0],          # field f2.g1 has field ID 1 in struct %SS and is initialized to 10.0
4 = 22.2,           # field f2.g2 has field ID 4 in struct %s and is initialized to 22.2
5 = 15 ]               # field f3 (4 bits in size) has field ID 5 in struct %s and is initialized to 15

## Type Parameters

Also called generics, type parameters allow derived types and functions to be written without specifying the exact types in parts of their contents.  The type parameters can be instantiated to different specific types later, thus enabling the code to be more widely applicable, promoting software re-use.

Type parameters and their instantiation can be handled completely by the language front-ends.  MAPLE IR provides representation for generic types and generic functions and their instantiation so as to reduce the amount of work in the language front-ends.  A MAPLE IR file with type parameters requires a front-end lowering phase to de-genericize the IR before the rest of the MAPLE IR components can process the code.

Type parameters are symbol names prefixed with "!", and can appear anywhere that a type can appear.  Each type or function definition can have multiple type parameters, and each type parameter can appear more than one time.  Since type parameters are also types, they can only appear inside the angular brackets "<" and ">", e.g. <!T>.  When the definition of a derived type contains any type parameter, the type becomes a generic type.  When the definition of a function contains any type parameter, the function becomes a generic function.  A function prototype cannot contain generic type.

A generic type or generic function is marked with the generic attribute to make them easier to be identified.

A generic type or function is instantiated by assigning specific non-generic types to each of its type parameters.  The instantiation is specified by a list of such assignments separated by commas.  We refer to this as an instantiation vector, which is specified inside braces "{" and "}".  In the case of the instantiation of a generic type, the type parameter is immediately followed by the instantiation vector.  Example:

type $apair <struct {@f1 <!T>, @f2 <!T>}> #$apair is a generic type

var $x <$apair{!T=f32}>           # the type of $x is$apair instantiated with f32 being assigned to the type parameter !T

A generic function is instantiated by invoking it with an instantiation vector. The instantiation vector immediately follows the name of the generic function.  Since the instantiation vector is regarded as type information, it is further enclosed inside the angular brackets "<" and ">".  Invocation of generic functions must be via the opcodes callinstant and callinstantassigned, which correspond to call and callassigned respectively.  Example:

func &swap (var %x <!UU>, var %y <!UU>) void {         # &swap is a generic
function to swap the contents of its two parameters

var %z <!UU>

dassign %z (dread agg %x)

dassiign %x (dread agg %y)

dassign %y (dread agg %z)

return

}

. . .

callinstant &swap<{!UU=<$apair{!T=i32}>}> ( # &swap is instantiated with type argument <$apair{!T=i32}>, itself an instantiated type

dread agg %a,                     # the actual parameters %a and %b must have same type as the same instantiated type

dread agg %b)

# Examples

## Example 1

Function definitions and arithmetic expressions:

C code:

int foo(int i,int j){
return (i + j) * -998;
}

Maple IR:

func &foo (var %i i32, var %j i32) i32 {
return (
mul i32 (
constval i32 -998))}

## Example 2

Loops and arrays

C source:

float a[10];
void init(void){
int i;
for(i=0; i<10; i++)
a[i]=i*3;
}

Maple IR:

var $a <[10] f32> func &init() void{ var %i i32 dassign %i(constval i32 0) while( lt i32 i32(dread i32 %i, constval i32 10)){ iassign<*[10] f32>( array a32<*[10] f32>(addrof a32$a, dread i32 %i),
mul i32(dread i32 %i, constval i32 3))
dassign %i(
add i32(dread i32 %i, constval i32 1))}}

## Example 3

Types and structs

C source:

typedef struct SS{
int f1;
char f2:6;
char f3:2;
}SS;
SS foo(SS x){
x.f2=33;
return x;
}

Maple IR:

type $SS <struct {@f1 i32, @f2:6 i8,@f3:2 i8}> func &foo (var %x <$SS>) <\$SS> {
dassign %x 2 (constval i32 32 )
return (dread agg %x) }

## Example 4

if-then-else, call and block

C source:

int fact(int n) {
if(n!=1)
return n*fact(n-1);
else return 1;
}

Maple IR:

func &fact (var %n i32) i32 {
if (ne i32 (dread i32 %n, constval i32 1)) {
call &fact (sub i32 (dread i32 %n, constval i32 1))
return (mul i32 (dread i32 %n, regread i32 %%retval))
}
return(constval i32 1)
}