Skip to main content

Command Palette

Search for a command to run...

Extending C++ using LLVM

Published
4 min read

Introduction

LLVM is an open-source, modular compiler infrastructure, and Clang is its C++ front-end.

In this project, I use LLVM to build a custom Clang compiler that introduces a new C++ builtin -
counting the number of fields in a C++ struct / class / union.

This is a toy example that would help you walk through Clang’s most important internals -
Parser (AST generation), Semantic Analysis and finally Code Generation.

The new keyword

In this example, we introduce a new keyword - __builtin_struct_field_count(type or variable)

An example usage of this keyword is shown below -

#include <iostream>

class A {
  int a;
  int b;
  double c;
};

int main() {
  A x;
  // Our new builtin, evaluated at compile time.
  std::cerr << __builtin_struct_field_count(A) << std::endl; // prints 3
  std::cerr << __builtin_struct_field_count(x) << std::endl; // prints 3

  return 0;
}

Code changes

The code changes can be found under this commit -

https://github.com/llvm/llvm-project/commit/5df0838ba2e6b9fff4d1c702f663a337a4fa9d58

Let’s go through the changes step by step -

Step 1 : Defining the keyword

All the keywords are defined inside clang/include/clang/Basic/ folder.

You first need to figure out what kind of expression your keyword is. For simple expressions, you can directly add them under Builtins.td file by defining the Name, Prototype and any other property (as defined inside Builtins.def). Eg. clang/include/clang/Basic/Builtins.td (defines a simple keyword for add(int, int) function).

But __builtin_struct_field_count() does not fit any of the prototypes defined under Builtins.def. So after facing a couple of compilation errors, I decided to check the implementation for keywords that have similar prototype like sizeof() or alignof(). After some keyword searches, I figured that __builtin_struct_field_count() falls under “Unary Expression or Type Trait” (or UETT in short) expression.
Unary Expression or Type Trait (UETT) is used for expressions that can operate on either a type or an expression, such as sizeof(T) or alignof(expr). I defined it inside the file ‎clang/include/clang/Basic/TokenKinds.def

If you read through TokenKinds.def and TypeTraits.h, you’ll find that the macro UNARY_EXPR_OR_TYPE_TRAIT(Spelling, Name, Key) would essentially define your keyword as a token of kind tok::kw_<Spelling> and also define an enum UnaryExprOrTypeTrait::UETT_<Name> (which shall be used in next steps).

Step 2 : Parsing Logic

Now that we have defined our new keyword, we want to make sure the Parsing logic identifies the keyword and sets its ExprKind (type of expression) correctly. Again following the implementations of sizeof() and alignof(), I found that the handling of these keywords is done inside ParseCastExpression() function of ParseExpr.cpp.

Although at first glance this seems like the wrong place as this expression does not qualify as a Cast Expression, but if you read through the comments mentioned at the start of this function, it mentions that this function is used to parse

all of cast-expression, unary-expression, postfix-expression, and primary-expression. We handle them together like this for efficiency and to simplify handling of an expression starting with a '(' token

It then calls ParseUnaryExprOrTypeTraitExpression() where in another switch case we define our ExprKind as UETT_StructFieldCount. I also noticed a couple of asserts where we needed to add our new keyword.

At the end of this step, the parser produces a UnaryExprOrTypeTraitExpr AST node with our custom UETT_StructFieldCount kind.

Relevant changes - clang/lib/Parse/ParseExpr.cpp

Step 3 : Semantic Analysis

The changes inside clang/lib/Sema/SemaExpr.cpp are done to add semantic checks on the Parsed Expression. These are the checks that would generate compilation errors when we write a code that does not conform to the logic defined here. I have skipped any specific handling for my new keyword for now and only added changes so that it goes through generic UETT expression checks.

Step 4 : Code Generation

This is the place where we add the final logic for our function. Code changes - clang/lib/CodeGen/CGExprScalar.cpp

Inside VisitUnaryExprOrTypeTraitExpr() function, we have the handling for all UETT expression kinds. Note that our output of this function is a simple scalar expression (an integer), hence we add the handling inside ScalarExprEmitter class. The handling is quite straight forward, we typecast our expression as RecordType instance. A RecordType is used to store information about Classes / Structs / Unions. We get the number of fields from the RecordType and return it as a ConstantInt (0 runtime overhead).

Compiling clang

You can compile this custom instance of clang from my llvm-project fork -

https://github.com/ayushbansal07/llvm-project
Branch - feature/struct-field-count

Caveats

I have skipped over the Semantic Analysis step in this branch. Hence the current compiler would crash on using the function incorrectly. For example, the following snippet crashes - __builtin_struct_field_count(1)