Archive for April, 2021

Location, Location, Location

April 27, 2021

As of a few days ago, a new feature in clang-query allows introspecting the source locations for a given clang AST node. The feature is also available for experimentation in Compiler Explorer. I previously delivered a talk at EuroLLVM 2019 and blogged in 2018 about this feature and others to assist in discovery of AST matchers and source locations. This is a major step in getting the Tooling API discovery features upstream into LLVM/Clang.

Background

When creating clang-tidy checks to perform source to source transformation, there are generally two steps common to all checks:

  • Matching on the AST
  • Replacing particular source ranges in source files with new text

To complete the latter, you will need to become familiar with the source locations clang provides for the AST. A diagnostic is then issued with zero or more “fix it hints” which indicate changes to the code. Almost all clang-tidy checks are implemented in this way.

Some of the source locations which might be interesting for a FunctionDecl are illustrated here:

Pick Your Name

A common use case for this kind of tooling is to port a large codebase from a deprecated API to a new API.

A tool might replace a member call pushBack with push_back on a custom container, for the purpose of making the API more like standard containers. It might be the case that you have multiple classes with a pushBack method and you only want to change uses of it on a particular class, so you can not simply find and replace across the entire repository.

Given test code like

    struct MyContainer
    {
        // deprected:
        void pushBack(int t);

        // new:
        void push_back(int t);    
    };

    void calls()
    {
        MyContainer mc;

        mc.pushBack(42);
    }

A matcher could look something like:

    match cxxMemberCallExpr(
    on(expr(hasType(cxxRecordDecl(hasName("MyContainer"))))),
    callee(cxxMethodDecl(hasName("pushBack")))
    )

Try experimenting with it on Compiler Explorer.

An explanation of how to discover how to write this AST matcher expression is out of scope for this blog post, but you can see blogs passim for that too.

Know Your Goal

Having matched a call to pushBack the next step is to replace the source text of the call with push_back. The call to mc.pushBack() is represented by an instance of CXXMemberCallExpr. Given the instance, we need to identify the location in the source of the first character after the “.” and the location of the opening paren. Given those locations, we create a diagnostic with a FixItHint to replace that source range with the new method name:

    diag(MethodCallLocation, "Use push_back instead of pushBack")
        << FixItHint::CreateReplacement(
            sourceRangeForCall, "push_back");

When we run our porting tool in clang-tidy, we get output similiar to:

warning: Use push_back instead of pushBack [misc-update-pushBack]
    mc.pushBack(42);
       ^~~~~~~~
       push_back

Running clang-tidy with -fix then causes the tooling to apply the suggested fix. Once we have tested it, we can run the tool to apply the change to all of our code at once.

Find Your Place

So, how do we identify the sourceRangeForCall?

One way is to study the documentation of the Clang AST to try to identify what API calls might be useful to access that particular source range. That is quite difficult to determine for newcomers to the Clang AST API.

The new clang-query feature allows users to introspect all available locations for a given AST node instance.

note: source locations here
    mc.pushBack(42);
       ^
 * "getExprLoc()"

note: source locations here
    mc.pushBack(42);
                  ^
 * "getEndLoc()"
 * "getRParenLoc()"

With this output, we can see that the location of the member call is retrievable by calling getExprLoc() on the CXXMemberCallExpr, which happens to be defined on the Expr base class. Because clang replacements can operate on token ranges, the location for the start of the member call is actually all we need to complete the replacement.

One of the design choices of the srcloc output of clang-query is that only locations on the “current” AST node are part of the output. That’s why for example, the arguments of a function call are not part of the locations output for a CXXMemberCallExpr. Instead it is necessary to traverse to the argument and introspect the locations of the node which represents the argument.

By traversing to the MemberExpr of the CXXMethodCallExpr we can see more locations. In particular, we can see that getOperatorLoc() can be used to get the location of the operator (a “.” in this case, but it could be a “->” for example) and getMemberNameInfo().getSourceRange() can be used to get a source range for the name of the member being called.

The Best Location

Given the choice of using getExprLoc() or getMemberNameInfo().getSourceRange(), the latter is preferable because it is more semantically related to what we want to replace. Aside from the hint that we want the “source range” of the “member name”, the getExprLoc() should be disfavored as that API is usually only used to choose a position to indicate in a diagnostic. That is not specifically what we wish to use the location for.

Additionally, by experimenting with slightly more complex code, we can see that getExprLoc() on a template-dependent call expression does not give the desired source location (At time of publishing! – This is likely undesirable in this case). At any rate, getMemberNameInfo().getSourceRange() gives the correct source range in all cases.

In the end, our diagnostic can look something like:

    diag(MethodCallLocation, "Use push_back instead of pushBack")
        << FixItHint::CreateReplacement(
            theMember->getMemberNameInfo().getSourceRange(), "push_back");

This feature is a powerful way to discover source locations and source ranges while creating and maintaining clang-tidy checks. Let me know if you find it useful!