KBiHash brings simple API for bi-directional hash.

While refactoring KSelectionProxyModel I’ve been trying to address the problem that it’s really hard to implement QAbstractProxyModel.

There’s just so much API to implement. The entire QAbstractItemModel API needs to be implemented along with listener API for the source model (eg, sourceRowsAboutToBeInserted/sourceRowsInserted) along with mapToSource and mapFromSource for the proxy model and the internal mapping structures necessary for that.

The requirement of an internal mapping structure is that it must be possible to map objects representing QModelIndexes in the sourceModel to QModelIndexes in the proxyModel itself and vice-versa. In the refactor-in-progress KSelectionProxyModel I was until recently using a QHash to map parent indexes in the source model to parent indexes in the proxy model. The QHash gives O(1) time complexity for lookups, so mapping from the source model to the proxy is fast.

However, mapping from the proxy model to the source model required a call to QHash::key() to get the key given a particular value. That is a O(N) call. Initially I considered adding another mapping in the other direction to the private class, and maintaining both within the proxy implementation. That would be error prone however and not reusable.

The reason for this refactoring work is to make reusable components to use to implement proxy models with. So I needed a class to implement an efficient mapping in both directions. I implemented KBiHash which provided an opportunity to try out the massif-visualiser Milian has been writing and the QTestLib-Tools.

KBiHash maintains two symmetric QHash objects

KBiHash maintains internally a QHash<T, U> and a QHash<U, T> and provides iterators for both “sides” of the mapping along with API for inserting, removing, updating and querying from both sides. The fact that it is also quite similar in structure to a QSet<QPair<T, U> > is not lost on me, so it also provides some set-like API such as intersect(), subtract() and union(). Because the implementation is a bit naïve though (it’s a quick hack) a few nice-to-haves are a bit restricted, such as operator[] returning an always const reference. Allowing the return value there to be modified would make the internal QHashes out of sync. I suspect though that a real proper implementation might be able to solve that, and be a lot faster, but for now the main benefit of this class is the convenient API for managing the two way mapping. It should be useful in QAbstractItemModel implementations too where you need to map from a void * pointer to something else and vice-versa.

Benchmark results comparing KBiHash with QHash are almost entirely unsurprising. All benchmarks were executed ten times and the results averaged. The results before averaging are here. In the plots below N refers to number of instructions executed as reported by callgrind and size is the size of the container under test. Note also that the x axis is logarithmic.

Insert grows linearly, but is usually 100% larger in the KBiHash case. This is because both the left and right side of the mapping need to be hashed on insert. It could be larger by a different percentage depending on the complexity of the hash function for each type if two different types are in the KBiHash.

Key lookup is comparable as expected. The small relative difference may be caused by the fact the the hash is nested in another class.

Reverse lookup cost grows linearly for calls to QHash::key(value). Reverse lookup for KBiHash is still O(1) as it is still just a hash lookup.

Cost for removing items by key is also as expected. The KBiHash case is more expensive because the item must be removed from both internal hashes

Results for removing values from the KBiHash are also comparable to removal of keys in the KBiHash. Removing values from the QHash is slower for the same reason as reverse lookup.

Updating a key is constant for both QHash and KBiHash.

Changes to values are constant in both QHash and KBiHash

The memory usage of KBiHash is also greater than QHash, as expected when maintaining two hashes. The peak and average memory consumption in the KBiHash case is almost twice as large as in the QHash case.

Memory profile of KBiHash

Memory profile of QHash

The case against boost::bimap

An important part of being a software developer is reinventing what already exists in the standard template library or boost. We all do it I think.

boost already has a class, boost::bimap, which implements a forward and backward mapping based on std::map API. There are a few reasons I didn’t use it here.

Firstly, writing KBiHash was a good learning experience. It’s the sort of thing which has lots of hidden corners that you don’t expect. For example, when inserting a new pair into the mapping, it is necessary to explicitly remove entries in the opposite map. I probably would not have remembered to do that each time if I was simply maintaining both mappings manually, but it’s good to learn.

Secondly, boost::bimap is semantically slightly different to what I need. When inserting into the bimap and the left or right value is already there, it is not overwritten, but the operation is ignored. If using boost::bimap I’d end up using a lot of remove followed by insert anyway to work around that.

Thirdly, as std::map maintains sort order of keys, it may be slower than QHash. The average for key lookup and insertion in a map is O(log n), whereas for QHash it is O(1). It would be nice to include it in the benchmarks here, which brings me to the last point.

The API of KBiHash is more Qt-like. I couldn’t figure out how to use boost::bimap. Even though there’s a long tutorial for it, I couldn’t figure out how to compile the benchmarks. My attempt is ifdef’d out in the benchmarks file though, so if anyone else can figure it out I’ll add it to the benchmark plots.

7 Responses to “KBiHash brings simple API for bi-directional hash.”

  1. bbigras Says:

    Did you have problems building qtestlib-tools?

    I have ‘/usr/lib/gcc/i486-linux-gnu/4.4.3/../../../../lib/crt1.o: In function `_start’:
    (.text+0x18): undefined reference to `main” when trying to build it with Qt 4.6.

  2. Tom G Says:

    off topic question:
    which program did u use to create these plots? they look fantastic! this could spice up my scientific presentations.. 🙂

  3. steveire Says:

    @bbigras: I think you don’t build it from the top level, but from the tools/reportgenerator directory.

    @Tom The interactive plots were made with qtestlib-tools. They are rendered to html canvas elements I believe.

  4. Gof Says:

    Using two QHash is a waste of memory and CPU.
    You could do something way better using a specialized data structure

    In particular, you could share a single node between two hash table using “crossed” linked list.

    struct KBiHashNode
    KBiHashNode *next_left;
    KBiHashNode *next_right;
    uint left_hash;
    uint right_hash;
    T left;
    U right;

    That is, only for the fun of it. If you are satisfied by the performance of KBiHash for your use case, stick with it, as implementing it without QHash is much more complex.

  5. steveire Says:

    Thanks for the pointer.

    That looks like a good direction for a future implementation. I would like to implement it that way, but I’ll wait for the point that performance becomes a problem.

    I initially thought I could use nodes something like that to make it better, but complexity led me to take the easy way out.

  6. Romain Says:

    Great post dude, interesting and nicely presented !

  7. KBiAssociativeContainer. Erm, huh? « Steveire’s Blog Says:

    […] while ago I wrote the KBiHash for encapsulating the content of two symmetrically updated QHash containers. The motivation for it […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: