Navigating Directed Graphs

Introduction

Navigating a large directed graph is an exercise in untangling a hairball. Take the complexity of a city and remove all of its urban planning — at Enigma, this is often the convolution we’re dealing with in our datasets. In the following paragraphs we explore how to unravel these tangled messes and get to tangible insights.

Data visualization of blue dots in a circle

Force-Directed Graphs

The visualization above represents committee-to-committee transactions for the 2017–2018 election cycle (source: FEC). Looks like a lot to figure out, but here’s the kicker: this graph represents just 2,000 transactions, while the dataset has over 850,000 total committee-to-committee transactions.

This chart is a force-directed-graph (FDG), a clever tool for visualizing complex networks with a physics-based simulation. An FDG can be an effective tool for capturing the birds-eye view of a network topology. One can use it to spot local clusters, densities, and overall distribution.

While an FDG is a good summary of a network, it falls short of comprehensive. The above network is useless to study as an interface — we need to use network algorithms to whittle down to valuable areas of focus, and then use different visualizations for more thorough analysis. The case study below is an example of this process.

Adding Context

Leading into the mid-term elections of 2018, we created a tool to study committee-to-committee transactions called PAC Paths. We first built a directed graph with FEC data that represented all transactions, and then applied Dijkstra’s algorithm to find the shortest weighted path between two committees.

Using the tool, one can determine if any two committees are connected in election cycles dating back to 1979. We found some interesting cases, like the American Medical Association connected to Big Tobacco and End Citizens United to Ted Cruz for Senate. It’s likely that committee A and committee B are not aware of their connection, but the density of committee-to-committee transactions often leads to counterintuitive results.

Data visualization of End Citizens United and Ted Cruz for Senate in 2017-2018 — A screenshot of the PAC Paths app.

Above is a screenshot of the interface, and you won’t find a force-directed graph in the tool at all. By applying the shortest path algorithm, we simplified the density of the network substantially, enabling the study of the connecting path rather than a tangled mess. Instead of FDG, the interface has a simple linear connection (left-side) coupled with a radial tidy tree for a selected committee (right-side).

Directed Graph Interfaces

The above screenshot is a simplified design for a complex graph. Now we’ll review the hybrid of a force-directed-graph and the PAC Paths interface. If we recall the wild network graph from the first paragraph, the visualization below should be a breath of fresh air.

Data visualization with blue dots congregating around points of End Citizen United, Ted Cruz For Senate and Paypac

This is the result of running Dijkstra’s algorithm and limiting the force-directed-graph to the four committees that define the path from End Citizens United to Ted Cruz for Senate. While the force-directed graph above reveals good information (DAYPAC has more connections in common with End Citizens United than with Ted Cruz for Senate), it still doesn’t provide the whole picture. To review a few issues:

Direction

The direction of these transactions are crucial (Big Tobacco giving to the American Medical Association would be a different signifier than the other way around), yet we haven’t represented them in the interface above. We can add arrows to each edge to represent direction, but this could quickly become unwieldy and illegible.

Additional Parameters

Suppose one wanted to take a closer look at a committee’s transaction amount compared to all other transactions made by that committee. How might this be represented? How about committee party affiliation? Or the state in which the committee is headquartered? Additional parameters don’t transfer well to a force-directed-graph.

Radio Tidy Tree

A radial tidy tree remedies some of these issues (at least for the study of a particular committee). We can see a radial tidy tree as the zoomed in version of a particular node in the force-directed-graph. By transforming a point into a circle, multiple dimensions are represented through polar coordinates.

Direction

The left-side represents incoming transactions (only one in the image below) while the right-side represents outgoing transactions.

Data visualization with one input that fans out to the right from the Daypac point

A radial tidy tree to represent direction of transactions (incoming on the left, outgoing on the right).

Additional Parameters — Bundling

By using bundling, the user can group the incoming and outgoing transactions based on a parameter for study (in this case, amount, state, and party affiliation).

4 data visualizations laid out in a grid, each showcasing a single input leading to a fan out of points to the right

The four diagrams above are representative of the same committee, but the bundling parameter is different in each instance. The radial tidy tree gives multiple tiers of data for free, offering spatial and intuitive categorization.

Connecting the Interface

The gif below represents a tool that transitions from the force-directed graph to a radial tidy tree for study. The network view is a standard force-directed graph. In the path view, the shortest weighted path is represented with aserial-radial tidy tree diagram to give a glimpse of each committee’s connections. These committees can be re-bundled by changing the relevant parameters, and clicking on the central node will zoom in to a specific committee. You can experiment with the live interface here.

Animated data visualization showcasing as you interact with the data set the full ecosystem that it creates

Summary

In our process, we’ve observed that analysis of a network graph is a series of incremental decision making: determining when to apply powerful algorithms, when to review and analyze their results, and when to solidify insights. The interface should therefore be tiered in parallel with these decisions. By developing a tool (and a visualization library) that’s intuitive and somewhat playful, the analysis can hold the user’s attention, and ideally enable the digestion of the hairball.