I scraped voting records for the entire 113th US Senate from GovTrack and analyzed them using a number of network graphing strategies..
All the source code for this project can be found here.
Using the NetworkX Python module, I worked with different ways of visualizing senator's voting behavior. I set it so that each node in a network represents a US senator, and the edges (ie. connections) between nodes are created and strengthened by similar voting behavior. The more often you vote "yea" or "nay" along with another senator, the closer and stronger your connection to that other voter. Here's the graph of my first attempt, using a spring layout - a type of force-directed graph (here Dems = blue, Reps = red):
Ok, that tells us something at least - it looks like Republicans tend to vote similarly, as do Democrats. Partisanship visualized. But it's still kind of messy - this is what network visualization folks call a "hairball". A cleaner way of representing the same network is with a Minimum Spanning Tree. This approach basically finds the shortest line that connects all the nodes (well, it's a bit more complicated than that, but roughly speaking it's true). In this context, I've made the edges represent voting difference - so a heavier edge means opposite voting behavior, and a lighter edge means more similar voting behavior. So the shortest (ie. lightest) path between all nodes would be the one which puts the most similar people closest to each other. I know it's a bit confusing - but it's easier to understand with a visual:
Now we can see the pattern a little more clearly. There's total separation between red and blue - radical voting partisanship. It's worth noting, though, that even in a sharply divided Senate, there are still some individuals closer to bipartisanship (ie. in the middle of the arc) and others who are more dedicatedly partisan. This brings up the notion of centrality, which is a name for a group of measures meant to assess the relative closeness of a node to all the other nodes in a network. Here we consider closeness centrality with respect to our dataset.
Closeness centrality refers to the summed distance of any one given node to all other nodes in a network. (Well, technically, it's the inverse of the summed distances, but that's really the same core ratio.) So nodes with high closeness centrality have faster access (ie. shorter walks) to other network nodes than do those with low closeness centrality.
In terms of partisanship, closeness centrality is positively related with bipartisan voting. The more you vote in agreement with senators across the aisle, the shorter your distance (here, measured by our 'difference' vector) to any one given senator - even the ones on the fringes.
By contrast, if you're extremely partisan, you may be very close to other members of your party, but the long walks necessary to connect your node to partisan nodes of the other party essentially make you, on average, 'far' from many other nodes. This prediction is imbalanced somewhat by the fact that, here, we have a Democratic majority in the Senate, and so even relatively partisan Democrats will appear to have higher closeness centrality, simply by virtue of the fact that there are more of their own party members to vote alongside.
Now, I'm a little shaky on the math here, but I found a passage on p13 of Borgatti's introduction of the closeness centrality formul (linked from the networkx documentation) that helped explain the outliers. Take the first of this pair of closeness centrality equations:
where m is '# of voting instances' and n is '# of potential voting agreements'.
So while the number of senators you could vote alongside remains relatively constant, our numerator is largely impacted by the number of voting events you take part in. If for some reason you didn't take part in many votes, then you'd have a reduced closeness centrality - which wouldn't really be an appropriate measure of your degree of partisanship.
For instance, the bar graph displayed below (click to enlarge) shows a few outliers on the low end. We have: Lautenberg, Chisea, Booker (all NJ) and Kerry, Cowan, and Markey (all MA). Here's the scoop:
- Lautenberg (D-NJ) has fewer voting instances by virtue of being dead.
- Chisea (R-NJ) served as interim senator for Lautenberg's spot before the election.
- Booker (D-NJ) won that , and was just sworn in last month. So all NJ senators had fewer voting instances per individual.
- Kerry, Cowan, and Markey have all occupied the same seat (as described in this question's intro) - all of them combined had fewer votes - same situation as NJ.
I played around with other ideas for measuring bipartisanship, aside from the idea of centrality. It's interesting to re-draw the network by weighting edges based only on bipartisan voting - the more you vote across the aisle, the closer you are to another node. What showed up is that it seems there's a smaller group of senators - mostly Democrat - that seem more inclined to vote with their political counterparts:
The exceptions in this graph support the validity of this approach - for instance, Collins (R-ME) has long had a reputation in the Senate for working towards compromise with Democrats, and we see her lone red circle in the middle group of bridge-makers.
On the other side, Manchin (D-WV) behaves pretty oddly for a registered Democrat. He has long been a strong supporter of the coal mining industry, and broke rank with his party members to vote against key gay rights measures. So the fact that his blue shows up in an ocean of red makes sense, considering his history.
In general, I prefer this approach to evaluating partisanship (over closeness centrality), as it gets at the cross-aisle voting habits that define bipartisan senators. In particular, the minimum spanning tree in this version revealed the sort of clustering I've described here that suggests, for the most part, it's a group of Democratic senators who lead the effort to bridge partisan politics.
I also looked a bit at other ways of evaluating senators' relative importance - such as the number of bills each one sponsors, as well as the co-sponsors they attract. (This is a similar concept to the PageRank algorithm Google uses to sort its search results.) The node approach tracks only loosely with the PageRank weights, as we can see in this bar graph below:
In comparing these rankings with actual leadership scores on GovTrack, it seems like PageRank is doing a far better job of accurately representing leadership. For example, Reid and Menendez are in the top 5 PageRanks, and we can see that they're also at the top of the leadership board on GovTrack:
To be fair, their Node Degree scores are also pretty high.
But consider Ayotte, who is somewhere in the middle of the PageRank pack, but whose Node Degree is about as high as the Top 5 PageRankers. Her actual (GovTrack) leadership score is far better represented by her PageRank, than by her Node Degree:
As such, it seems clear that PageRank scores track better with actual senate leadership - at least as it's measured by GovTrack.
Finally, I tried my hand at creating a visualization in Gephi, a desktop application for network displays. Using the PageRank scores, I was able to generate an image of the "Senate Universe" - a vision of a house sharply divided:
That's a pretty stark representation of how messed up our legislative branch is. But I guess you probably would have believed me even if I didn't go through all that work...anyway, a nice intro for me to the intricacies of network analysis and graph theory.