Archive for the ‘clang’ Category

Refactor with Clang Tooling at code::dive 2018

January 2, 2019

I delivered a talk about writing a refactoring tool with Clang Tooling at code::dive in November. It was uploaded to YouTube today:

The slides are available here and the code samples are here.

This was a fun talk to deliver as I got to demo some features which had never been seen by anyone before. For people who are already familiar with clang-tidy and clang-query, the interesting content starts about 15 minutes in. There I start to show new features in the clang-query interpreter command line.

The existing clang-query interpreter lacks many features which the replxx library provides, such as syntax highlighting and portable code completion:

It also allows scrolling through results to make a selection:

A really nice feature is value-based code-completion instead of type-based code completion. Existing code completion only completes candidates based on type-compatibility. It recognizes that a parameterCountIs() matcher can be used with a functionDecl() matcher for example. If the code completion already on the command line is sufficiently constrained so that there is only one result already, the code completion is able to complete candidates based on that one result node:

Another major problem with clang-query currently is that it is hard to know which parenthesis represents the closing of which matcher. The syntax highlighting of replxx help with this, along with a brace matching feature I wrote for it:

I’m working on upstreaming those features to replxx and Clang to make them available for everyone, but for now it is possible to experiment with some of the features on my Compiler Explorer instance on ce.steveire.com.

I wrote about the AST-Matcher and Source Location/Source Range discovery features on my blog here since delivering the talk. I also wrote about Composing AST Matchers, which was part of the tips and tricks section of the talk. Over on the Visual C++ blog, I wrote about distributing the refactoring task among computers on the network using Icecream. My blogs on that platform can be seen in the Clang category.

All of that blog content is repeated in the code::dive presentation, but some people prefer to learn from conference videos instead of blogs, so this might help the content reach a larger audience. Let me know if there is more you would like to see about clang-query!

Advertisements

Composing AST Matchers in clang-tidy

November 20, 2018

When creating clang-tidy checks, it is common to extract parts of AST Matcher expressions to local variables. I expanded on this in a previous blog.

auto nonAwesomeFunction = functionDecl(
  unless(matchesName("^::awesome_"))
  );

Finder->addMatcher(
  nonAwesomeFunction.bind("addAwesomePrefix")
  , this);

Finder->addMatcher(
  callExpr(callee(nonAwesomeFunction)).bind("addAwesomePrefix")
  , this);

Use of such variables establishes an emergent extension API for re-use in the checks, or in multiple checks you create which share matcher requirements.

When attempting to match items inside a ForStmt for example, we might encounter the difference in the AST depending on whether braces are used or not.

#include <vector>

void foo()
{
    std::vector<int> vec;
    int c = 0;
    for (int i = 0; i < 100; ++i)
        vec.push_back(i);

    for (int i = 0; i < 100; ++i) {
        vec.push_back(i);
    }
}

In this case, we wish to match the push_back method inside a ForStmt body. The body item might be a CompoundStmt or the CallExpr we wish to match. We can match both cases with the anyOf matcher.

auto pushbackcall = callExpr(callee(functionDecl(hasName("push_back"))));

Finder->addMatcher(
    forStmt(
        hasBody(anyOf(
            pushbackcall.bind("port_call"), 
            compoundStmt(has(pushbackcall.bind("port_call")))
            ))
        )
    , this);

Having to list the pushbackcall twice in the matcher is suboptimal. We can do better by defining a new API function which we can use in AST Matcher expressions:

auto hasIgnoringBraces = [](auto const& Matcher)
{
    return anyOf(
        Matcher, 
        compoundStmt(has(Matcher))
        );
};

With this in hand, we can simplify the original expression:

auto pushbackcall = callExpr(callee(functionDecl(hasName("push_back"))));

Finder->addMatcher(
    forStmt(
        hasBody(hasIgnoringBraces(
            pushbackcall.bind("port_call")
            ))
        ) 
    , this);

This pattern of defining AST Matcher API using a lambda function finds use in other contexts. For example, sometimes we want to find and bind to an AST node if it is present, ignoring its absense if is not present.

For example, consider wishing to match struct declarations and match a copy constructor if present:

struct A
{
};

struct B
{
    B(B const&);
};

We can match the AST with the anyOf() and anything() matchers.

Finder->addMatcher(
    cxxRecordDecl(anyOf(
        hasMethod(cxxConstructorDecl(isCopyConstructor()).bind("port_method")), 
        anything()
        )).bind("port_record")
    , this);

This can be generalized into an optional() matcher:

auto optional = [](auto const& Matcher)
{
    return anyOf(
        Matcher,
        anything()
        );
};

The anything() matcher matches, well, anything. It can also match nothing because of the fact that a matcher written inside another matcher matches itself.

That is, matchers such as

functionDecl(decl())
functionDecl(namedDecl())
functionDecl(functionDecl())

match ‘trivially’.

If a functionDecl() in fact binds to a method, then the derived type can be used in the matcher:

functionDecl(cxxMethodDecl())

The optional matcher can be used as expected:

Finder->addMatcher(
    cxxRecordDecl(
        optional(
            hasMethod(cxxConstructorDecl(isCopyConstructor()).bind("port_method"))
            )
        ).bind("port_record")
    , this);

Yet another problem writers of clang-tidy checks will find is that AST nodes CallExpr and CXXConstructExpr do not share a common base representing the ability to take expressions as arguments. This means that separate matchers are required for calls and constructions.

Again, we can solve this problem generically by creating a composition function:

auto callOrConstruct = [](auto const& Matcher)
{
    return expr(anyOf(
        callExpr(Matcher),
        cxxConstructExpr(Matcher)
        ));
};

which reads as ‘an Expression which is any of a call expression or a construct expression’.

It can be used in place of either in matcher expressions:

Finder->addMatcher(
    callOrConstruct(
        hasArgument(0, integerLiteral().bind("port_literal"))
        )
    , this);

Creating composition functions like this is a very convenient way to simplify and create maintainable matchers in your clang-tidy checks. A recently published RFC on the topic of making clang-tidy checks easier to write proposes some other conveniences which can be implemented in this manner.

Future Developments in clang-query

November 11, 2018

Getting started – clang-tidy AST Matchers

Over the last few weeks I published some blogs on the Visual C++ blog about Clang AST Matchers. The series can be found here:

I am not aware of any similar series existing which covers creation of clang-tidy checks, and use of clang-query to inspect the Clang AST and assist in the construction of AST Matcher expressions. I hope the series is useful to anyone attempting to write clang-tidy checks. Several people have reported to me that they have previously tried and failed to create clang-tidy extensions, due to various issues, including lack of information tying it all together.

Other issues with clang-tidy include the fact that it relies on the “mental model” a compiler has of C++ source code, which might differ from the “mental model” of regular C++ developers. The compiler needs to have a very exact representation of the code, and needs to have a consistent design for the class hierarchy representing each standard-required feature. This leads to many classes and class hierarchies, and a difficulty in discovering what is relevant to a particular problem to be solved.

I noted several problems in those blog posts, namely:

  • clang-query does not show AST dumps and diagnostics at the same time<
  • Code completion does not work with clang-query on Windows
  • AST Matchers which are appropriate to use in contexts are difficult to discover
  • There is no tooling available to assist in discovery of source locations of AST nodes

Last week at code::dive in Wroclaw, I demonstrated tooling solutions to all of these problems. I look forward to video of that talk (and videos from the rest of the conference!) becoming available.

Meanwhile, I’ll publish some blog posts here showing the same new features in clang-query and clang-tidy.

clang-query in Compiler Explorer

Recent work by the Compiler Explorer maintainers adds the possibility to use source code tooling with the website. The compiler explorer contains new entries in a menu to enable a clang-tidy pane.

clang-tidy in Compiler Explorer

clang-tidy in Compiler Explorer

I demonstrated use of compiler explorer to use the clang-query tool at the code::dive conference, building upon the recent work by the compiler explorer developers. This feature will get upstream in time, but can be used with my own AWS instance for now. This is suitable for exploration of the effect that changing source code has on match results, and orthogonally, the effect that changing the AST Matcher has on the match results. It is also accessible via ce.steveire.com.

It is important to remember that Compiler Explorer is running clang-query in script mode, so it can process multiple let and match calls for example. The new command set print-matcher true helps distinguish the output from the matcher which causes the output. The help command is also available with listing of the new features.

The issue of clang-query not printing both diagnostic information and AST information at the same time means that users of the tool need to alternate between writing

set output diag

and

set output dump

to access the different content. Recently, I committed a change to make it possible to enable both output and diag output from clang-query at the same time. New commands follow the same structure as the set output command:

enable output detailed-ast
disable output detailed-ast

The set output command remains as an “exclusive” setting to enable only one output feature and disable all others.

Dumping possible AST Matchers

This command design also enables the possibility of extending the features which clang-query can output. Up to now, developers of clang-tidy extensions had to inspect the AST corresponding to their source code using clang-query and then use that understanding of the AST to create an AST Matcher expression.

That mapping to and from the AST “mental model” is not necessary. New features I am in the process of upstreaming to clang-query enable the output of AST Matchers which may be used with existing bound AST nodes. The command

enable output matcher

causes clang-query to print out all matcher expressions which can be combined with the bound node. This cuts out the requirement to dump the AST in such cases.

Inspecting the AST is still useful as a technique to discover possible AST Matchers and how they correspond to source code. For example if the functionDecl() matcher is already known and understood, it can be dumped to see that function calls are represented by the CallExpr in the Clang AST. Using the callExpr() AST Matcher and dumping possible matchers to use with it leads to the discovery that callee(functionDecl()) can be used to determine particulars of the function being called. Such discoveries are not possible by only reading AST output of clang-query.

Dumping possible Source Locations

The other important discovery space in creation of clang-tidy extensions is that of Source Locations and Source Ranges. Developers creating extensions must currently rely on the documentation of the Clang AST to discover available source locations which might be relevant. Usually though, developers have the opposite problem. They have source code, and they want to know how to access a source location from the AST node which corresponds semantically to that line and column in the source.

It is important to make use a semantically relevant source location in order to make reliable tools which refactor at scale and without human intervention. For example, a cursory inspection of the locations available from a FunctionDecl AST node might lead to the belief that the return type is available at the getBeginLoc() of the node.

However, this is immediately challenged by the C++11 trailing return type feature, where the actual return type is located at the end. For a semanticallly correct location, you must currently use

getTypeSourceInfo()->getTypeLoc().getAs().getReturnLoc().getBeginLoc()

It should be possible to use getReturnTypeSourceRange(), but a bug in clang prevents that as it does not appreciate the trailing return types feature.

Once again, my new output feature of clang-query presents a solution to this discovery problem. The command

enable output srcloc

causes clang-query to output the source locations by accessor and caret corresponding to the source code for each of the bound nodes. By inspecting that output, developers of clang-tidy extensions can discover the correct expression (usually via the clang::TypeLoc heirarchy) corresponding to the source code location they are interested in refactoring.

Next Steps

I have made many more modifications to clang-query which I am in the process of upstreaming. My Compiler explorer instance is listed as the ‘clang-query-future’ tool, while the clang-query-trunk tool runs the current trunk version of clang-query. Both can be enabled for side-by-side comparison of the future clang-query with the exising one.

API Changes in Clang

September 13, 2018

I’ve started contributing to Clang, in the hope that I can improve the API for tooling. This will eventually mean changes to the C++ API of Clang, the CMake buildsystem, and new features in the tooling. Hopefully I’ll remember to blog about changes I make.

The Department of Redundancy Department

I’ve been implementing custom clang-tidy checks and have become quite familiar with the AST Node API. Because of my background in Qt, I was immediately disoriented by some API inconsistency. Certain API classes had both getStartLoc and getLocStart methods, as well as both getEndLoc and getLocEnd etc. The pairs of methods return the same content, so at least one set of them is redundant.

I’m used to working on stable library APIs, but Clang is different in that it offers no API stability guarantees at all. As an experiment, we staggered the introduction of new API and removal of old API. I ended up replacing the getStartLoc and getLocStart methods with getBeginLoc for consistency with other classes, and replaced getLocEnd with getEndLoc. Both old and new APIs are in the Clang 7.0.0 release, but the old APIs are already removed from Clang master. Users of the old APIs should port to the new ones at the next opportunity as described here.

Wait a minute, Where’s me dump()er?

Clang AST classes have a dump() method which is very useful for debugging. Several tools shipped with Clang are based on dumping AST nodes.

The SourceLocation type also provides a dump() method which outputs the file, line and column corresponding to a location. The problem with it though has always been that it does not include a newline at the end of the output, so the output gets lost in noise. This 2013 video tutorial shows the typical developer experience using that dump method. I’ve finally fixed that in Clang, but it did not make it into Clang 7.0.0.

In the same vein, I also added a dump() method to the SourceRange class. This prints out locations in the an angle-bracket format which shows only what changed between the beginning and end of the range.

Let it bind

When writing clang-tidy checks using AST Matchers, it is common to factor out intermediate variables for re-use or for clarity in the code.

auto valueMethod = cxxMethodDecl(hasName("value"));
Finer->addMatcher(valueMethod.bind("methodDecl"));

clang-query has an analogous way to create intermediate matcher variables, but binding to them did not work. As of my recent commit, it is possible to create matcher variables and bind them later in a matcher:

let valueMethod cxxMethodDecl(hasName("value"))
match valueMethod.bind("methodDecl")
match callExpr(callee(valueMethod.bind("methodDecl"))).bind("methodCall")

Preload your Queries

Staying on the same topic, I extended clang-query with a --preload option. This allows starting clang-query with some commands already invoked, and then continue using it as a REPL:

bash$ cat cmds.txt
let valueMethod cxxMethodDecl(hasName("value"))

bash$ clang-query --preload cmds.txt somefile.cpp
clang-query> match valueMethod.bind("methodDecl")

Match #1:

somefile.cpp:4:2: note: "methodDecl" binds here
        void value();
        ^~~~~~~~~~~~

1 match.

Previously, it was only possible to run commands from a file without also creating a REPL using the -c option. The --preload option with the REPL is useful when experimenting with matchers and having to restart clang-query regularly. This happens a lot when modifying code to examine changes to AST nodes.

Enjoy!