Boost dependencies and bcp

Recently I generated diagrams showing the header dependencies between Boost libraries, or rather, between various Boost git repositories. Diagrams showing dependencies for each individual Boost git repo are here along with dot files for generating the images.

The monster diagram is here:

Edges and Incidental Modules and Packages

The directed edges in the graphs represent that a header file in one repository #includes a header file in the other repository. The idea is that, if a packager wants to package up a Boost repo, they can’t assume anything about how the user will use it. A user of Boost.ICL can choose whether ICL will use Boost.Container or not by manipulating the ICL_USE_BOOST_MOVE_IMPLEMENTATION preprocessor macro. So, the packager has to list Boost.Container as some kind of dependency of Boost.ICL, so that when the package manager downloads the boost-icl package, the boost-container package is automatically downloaded too. The dependency relationship might be a ‘suggests’ or ‘recommends’, but the edge will nonetheless exist somehow.

In practice, packagers do not split Boost into packages like that. At least for debian packages they split compiled static libraries into packages such as libboost-serialization1.58, and put all the headers (all header-only libraries) into a single package libboost1.58-dev. Perhaps the reason for packagers putting it all together is that there is little value in splitting the header-only repository content in the monolithic Boost from each other if it will all be packaged anyway. Or perhaps the sheer number of repositories makes splitting impractical. This is in contrast to KDE Frameworks, which does consider such edges and dependency graph size when determining where functionality belongs. Typically KDE aims to define the core functionality of a library on its own in a loosely coupled way with few dependencies, and then add integration and extension for other types in higher level libraries (if at all).

Another feature of my diagrams is that repositories which depend circularly on each other are grouped together in what I called ‘incidental modules‘. The name is inspired by ‘incidental data structures’ which Sean Parent describes in detail in one of his ‘Better Code’ talks. From a packager point of view, the Boost.MPL repo and the Boost.Utility repo are indivisible because at least one header of each repo includes at least one header of the other. That is, even if packagers wanted to split Boost headers in some way, the ‘incidental modules’ would still have to be grouped together into larger packages.

As far as I am aware such circular dependencies don’t fit with Standard C++ Modules designs or the design of Clang Modules, but that part of C++ would have to become more widespread before Boost would consider their impact. There may be no reason to attempt to break these ‘incidental modules’ apart if all that would do is make some graphs nicer, and it wouldn’t affect how Boost is packaged.

My script for generating the dependency information is simply grepping through the include/ directory of each repository and recording the #included files in other repositories. This means that while we know Boost.Hana can be used stand-alone, if a packager simply packages up the include/boost/hana directory, the result will have dependencies on parts of Boost because Hana includes code for integration with existing Boost code.

Dependency Analysis and Reduction

One way of defining a Boost library is to consider the group of headers which are gathered together and documented together to be a library (there are other ways which some in Boost prefer – it is surprisingly fuzzy). That is useful for documentation at least, but as evidenced it appears to not be useful from a packaging point of view. So, are these diagrams useful for anything?

While Boost header-only libraries are not generally split in standard packaging systems, the bcp tool is provided to allow users to extract a subset of the entire Boost distribution into a user-specified location. As far as I know, the tool scans header files for #include directives (ignoring ifdefs, like a packager would) and gathers together all of the transitively required files. That means that these diagrams are a good measure of how much stuff the bcp tool will extract.

Note also that these edges do not contribute time to your slow build – reducing edges in the graphs by moving files won’t make anything faster. Rewriting the implementation of certain things might, but that is not what we are talking about here.

I can run the tool to generate a usable Boost.ICL which I can easily distribute. I delete the docs, examples and tests from the ICL directory because they make up a large chunk of the size. Such a ‘subset distribution’ doesn’t need any of those. I also remove 3.5M of preprocessed files from MPL. I then need to define BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS when compiling, which is easy and explained at the end:

$ bcp --boost=$HOME/dev/src/boost icl myicl
$ rm -rf boostdir/libs/icl/{doc,test,example}
$ rm -rf boostdir/boost/mpl/aux_/preprocessed
$ du -hs myicl/
15M     myicl/

Ok, so it’s pretty big. Looking at the dependency diagram for Boost.ICL you can see an arrow to the ‘incidental spirit’ module. Looking at the Boost.Spirit dependency diagram you can see that it is quite large.

Why does ICL depend on ‘incidental spirit’? Can that dependency be removed?

For those ‘incidental modules’, I selected one of the repositories within the group and named the group after that one repository. Too see why ICL depends on ‘incidental spirit’, we have to examine all 5 of the repositories in the group to check if it is the one responsible for the dependency edge.

boost/libs/icl$ git grep -Pl -e include --and \
  -e "thread|spirit|pool|serial|date_time" include/
include/boost/icl/gregorian.hpp
include/boost/icl/ptime.hpp

Formatting wide terminal output is tricky in a blog post, so I had to make some compromises in the output here. Those ICL headers are including Boost.DateTime headers.

I can further see that gregorian.hpp and ptime.hpp are ‘leaf’ files in this analysis. Other files in ICL do not include them.

boost/libs/icl$ git grep -l gregorian include/
include/boost/icl/gregorian.hpp
boost/libs/icl$ git grep -l ptime include/
include/boost/icl/ptime.hpp

As it happens, my ICL-using code also does not need those files. I’m only using icl::interval_set<double> and icl::interval_map<double>. So, I can simply delete those files.

boost/libs/icl$ git grep -l -e include \
  --and -e date_time include/boost/icl/ | xargs rm
boost/libs/icl$

and run the bcp tool again.

$ bcp --boost=$HOME/dev/src/boost icl myicl
$ rm -rf myicl/libs/icl/{doc,test,example}
$ rm -rf myicl/boost/mpl/aux_/preprocessed
$ du -hs myicl/
12M     myicl/

I’ve saved 3M just by understanding the dependencies a bit. Not bad!

Mostly the size difference is accounted for by no longer extracting boost::mpl::vector, and secondly the Boost.DateTime headers themselves.

The dependencies in the graph are now so few that we can consider them and wonder why they are there and can they be removed. For example, there is a dependency on the Boost.Container repository. Why is that?

include/boost/icl$ git grep -C2 -e include \
   --and -e boost/container
#if defined(ICL_USE_BOOST_MOVE_IMPLEMENTATION)
#   include <boost/container/set.hpp>
#elif defined(ICL_USE_STD_IMPLEMENTATION)
#   include <set>
--

#if defined(ICL_USE_BOOST_MOVE_IMPLEMENTATION)
#   include <boost/container/map.hpp>
#   include <boost/container/set.hpp>
#elif defined(ICL_USE_STD_IMPLEMENTATION)
#   include <map>
--

#if defined(ICL_USE_BOOST_MOVE_IMPLEMENTATION)
#   include <boost/container/set.hpp>
#elif defined(ICL_USE_STD_IMPLEMENTATION)
#   include <set>

So, Boost.Container is only included if the user defines ICL_USE_BOOST_MOVE_IMPLEMENTATION, and otherwise not. If we were talking about C++ code here we might consider this a violation of the Interface Segregation Principle, but we are not, and unfortunately the realities of the preprocessor mean this kind of thing is quite common.

I know that I’m not defining that and I don’t need Boost.Container, so I can hack the code to remove those includes, eg:

index 6f3c851..cf22b91 100644
--- a/include/boost/icl/map.hpp
+++ b/include/boost/icl/map.hpp
@@ -12,12 +12,4 @@ Copyright (c) 2007-2011:
 
-#if defined(ICL_USE_BOOST_MOVE_IMPLEMENTATION)
-#   include <boost/container/map.hpp>
-#   include <boost/container/set.hpp>
-#elif defined(ICL_USE_STD_IMPLEMENTATION)
 #   include <map>
 #   include <set>
-#else // Default for implementing containers
-#   include <map>
-#   include <set>
-#endif

This and following steps don’t affect the filesystem size of the result. However, we can continue to analyze the dependency graph.

I can break apart the ‘incidental fusion’ module by deleting the iterator/zip_iterator.hpp file, removing further dependencies in my custom Boost.ICL distribution. I can also delete the iterator/function_input_iterator.hpp file to remove the dependency on Boost.FunctionTypes. The result is a graph which you can at least reason about being used in an interval tree library like Boost.ICL, quite apart from our starting point with that library.

You might shudder at the thought of deleting zip_iterator if it is an essential tool to you. Partly I want to explore in this blog post what will be needed from Boost in the future when we have zip views from the Ranges TS or use the existing ranges-v3 directly, for example. In that context, zip_iterator can go.

Another feature of the bcp tool is that it can scan a set of source files and copy only the Boost headers that are included transitively. If I had used that, I wouldn’t need to delete the ptime.hpp or gregorian.hpp etc because bcp wouldn’t find them in the first place. It would still find the Boost.Container etc includes which appear in the ICL repository however.

In this blog post, I showed an alternative approach to the bcp --scan attempt at minimalism. My attempt is to use bcp to export useful and as-complete-as-possible libraries. I don’t have a lot of experience with bcp, but it seems that in scanning mode I would have to re-run the tool any time I used an ICL header which I had not used before. With the modular approach, it would be less-frequently necessary to run the tool (only when directly using a Boost repository I hadn’t used before), so it seemed an approach worth exploring the limitations of.

Examining Proposed Standard Libraries

We can also examine other Boost repositories, particularly those which are being standardized by newer C++ standards because we know that any, variant and filesystem can be implemented with only standard C++ features and without Boost.

Looking at Boost.Variant, it seems that use of the Boost.Math library makes that graph much larger. If we want Boost.Variant without all of that Math stuff, one thing we can choose to do is copy the one math function that Variant uses, static_lcm, into the Variant library (or somewhere like Boost.Core or Boost.Integer for example). That does cause a significant reduction in the dependency graph.

Further, I can remove the hash_variant.hpp file to remove the Boost.Functional dependency:

I don’t know if C++ standardized variant has similar hashing functionality or how it is implemented, but it is interesting to me how it affects the graph.

Using a bcp-extracted library with Modern CMake

After extracting a library or set of libraries with bcp, you might want to use the code in a CMake project. Here is the modern way to do that:

add_library(boost_mpl INTERFACE)
target_compile_definitions(boost_mpl INTERFACE
    BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
)
target_include_directories(boost_mpl INTERFACE 
    "${CMAKE_CURRENT_SOURCE_DIR}/myicl"
)

add_library(boost_icl INTERFACE)
target_link_libraries(boost_icl INTERFACE boost_mpl)
target_include_directories(boost_icl INTERFACE 
    "${CMAKE_CURRENT_SOURCE_DIR}/myicl/libs/icl/include"
)
add_library(boost::icl ALIAS boost_icl)
#

Boost ships a large chunk of preprocessed headers for various compilers, which I mentioned above. The reasons for that are probably historical and obsolete, but they will remain and they are used by default when using GCC and that will not change. To diverge from that default it is necessary to set the BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS preprocessor macro.

By defining an INTERFACE boost_mpl library and setting its INTERFACE target_compile_definitions, any user of that library gets that magic BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS define when compiling its sources.

MPL is just an internal implementation detail of ICL though, so I won’t have any of my CMake targets using MPL directly. Instead I additionally define a boost_icl INTERFACE library which specifies an INTERFACE dependency on boost_mpl with target_link_libraries.

The last ‘modern’ step is to define an ALIAS library. The alias name is boost::icl and it aliases the boost_icl library. To CMake, the following two commands generate an equivalent buildsystem:

target_link_libraries(myexe boost_icl)
target_link_libraries(myexe boost::icl)
#

Using the ALIAS version has a different effect however: If the boost::icl target does not exist an error will be issued at CMake time. That is not the case with the boost_icl version. It makes sense to use target_link_libraries with targets with :: in the name and ALIAS makes that possible for any library.

4 Responses to “Boost dependencies and bcp”

  1. Links 28/8/2016: Q4OS 1.6, ConnochaetOS 14.2 | Techrights Says:

    […] Boost dependencies and bcp […]

  2. Jeff Trull Says:

    Have you experimented with boostdep? A bit of a different approach from bcp. I used it to list transitive library dependencies, then updated only those submodules. Keeps Boost out of my vcs 🙂

    • steveire Says:

      I also prefer to keep boost out of my vcs :). However, this kind of thing is apparently something people do, so I thought it worth exploring. I have not investigated boostdep though.

  3. Why iterator is not dereferenced as an lvalue - Tutorial Guruji Says:

    […] Note: I am aware of how to reduce boost dependecies thanks to Boost dependencies and bcp. […]

Leave a comment