Correlate… ALL THE THINGS I: Omaha


This is the first post in a 4 part series examining interesting relationships that can be found when you correlate every characteristic/attribute of fine grain (Census Tract) geographies in the mySidewalk database for a metro and then analyze the interesting ones. Also in the series: definitions, part II
Scope
If this is your first taste of “Correlate… ALL THE THINGS”, I recommend taking a look at this handy/hilarious/handsomely written reference and definitions post.
We’ll be focusing on the Omaha-Council Bluffs Metropolitan Area — A metropolitan area made up of the two namesake cities and the surrounding area; this metro has about 900k people residing within it making it the 57th largest metro. Wikipedia can provide the nitty gritty. For the purposes of this publication we’re using the counties of this metro as a mask to intersect census tracts (click the county and census tracts links for the precise geographies as geojson).


Correlating ALL the Things


The point of correlating every characteristic of every tract in this metro to every other characteristic is that I don’t have to do any of the backbreaking manual labor that my colleagues in data science refer to lovingly as feature selection (carefully examining the features/dimensions of a large dataset and calling upon your expertise, intuition, and trial/error to limit the analysis, modeling, included dimensions, etc.). Instead, I am literally mashing up every measurement using a fast computer, color coding it by relative strength of relationship, and letting my eyes be my guide.
So, without further delay, I present the following results for your browsing and enlightenment pleasure.


Menacing and hilarious screenshots aside, here’s a link to the actual results:
This spreadsheet and all data within is provided under a Creative Commons Attribution-NonCommercial 4.0 license as…docs.google.com
Some Interesting Highlights
Disclaimer: The data these correlations are derived from is inherently socio-demographic (heck, voter registration data is by definition straight up political) and thus *can be* inaccurately interpreted to have political leanings, opinions, or messages; statistical relationships between two series of observations do not make judgments and so none should be inferred from the data, any inferences are your own (or maybe mine…) (another departure: unlike data, I fully believe pets do make judgments and your cat/dog judged you thoroughly for the lazy pajamas and Netflix weekend you indulged in recently). Additionally, which values of correlation coefficients represent a strong relationship is a matter of infinite debate. Every indication that a strong relationship exists should be taken to mean “relatively strong” and with a grain of salt. Any indications that a correlation implies causation are a mere trick of word choice. Further, this is not an exhaustive list — I encourage the reader to find their own interesting correlations and share them with everyone!
Highlight #1: Big Tracts, Rural Areas, Internet Service, and Commuting
The area of a census tract is highly correlated to just about anything associated with rural demographics because rural areas have larger census tracts (smaller population densities) and negatively correlated just as strongly to anything urban, because urban areas have smaller census tracts. If you want to examine any given characteristic and determine its relationship to rural qualities, the total area correlation is your best friend.
For example (and reading right from the top of the sheet): Area is strongly correlated (.5831 correlation coefficient) with max internet download speeds of 10 to 25 MB/s down. This makes sense as there are fewer internet service products available in the rural edges of the metro and this speed is apparently one of the more popular ones. Other products, like 1.5 to 3 MB/s and the high end 100 to 1,024 MB/s service are found more frequently in smaller census tracts and thus have negative correlations (-.2635 and -.1237, respectively). Area is also strongly correlated (.3453) to population of houses built prior to 1939 most likely due to more rural areas having less new development (proportionally).
In the same vein, commute mean travel time has interestingly strong relationships to internet speed. I assume this to mean that when asked which internet service product they’d like to sign up for, Omahicans living in the western end of the metro (where the tracts are the largest) said, “Well I already have to drive 30 minutes downtown to work everyday, there’s no way I’m waiting for HBO Go to buffer when I get home at night.”
Another commuting tidbit: the more recently a household has moved to their current location, the more likely they are to have made the metropolitan choice and use public transit (this still happens very rarely in this metro, however).
Highlight #2: Female Populations
On their face, total female population and total male population of a census tract don’t seem interesting as the populations are usually pretty close to equal and they tend to correlate with items that correlate with total population for uninteresting reasons. However, when compared to the corresponding correlation for total male population, total female population is slightly more correlative for some interesting (and often positive) variables:
- Female population correlates well to most variables indicating a longer lived population; ages over 75 start to make a particularly strong showing with the split between the genders making up ~15% of the male population correlation coefficient (male population coefficient: .2761 female population coefficient: .3550); at age 85 and over the female population dominates with a gap accounting for ~33% of the male population correlation coefficient (male population coefficient: .2050 female population coefficient: .2994).
- The correlation difference for the female population is particularly negative for population with less than a 9th grade education (male: .2036 female: .1198). Correlating negatively with a societal negative (not making it past 9th grade…) is a societal positive!
- The female population correlates noticeably more strongly with graduation rates in general (twice as strongly for high school graduation rate with a .0915 male correlation and .1704 for female).
- Larger female populations seem to have favored careers in finance/insurance, arts/food/entertainment, retail, and healthcare while eschewing manufacturing, construction, and ag/fishing/mining.
- In two quirky situations that I can’t even come up with a proposal for a cause — large female populations seem to have prefered housing that is “just right” in age and number of units (meaning male population correlates more strongly with the fewest units and most units in the oldest and newest buildings while female population correlates well with the middle groupings of those two classes of characteristic).
- Female population correlates more strongly than male with Democratic Party affiliation.
Who run the world?
Highlight #3: Political Party Affiliation
Now that we’ve mentioned party affiliation, let’s get this out of the way: percent affiliation between Democrat, Republican, and “other” parties in the Omaha Metro correlates exactly as every other social scientist has ever described it.
I encourage you to take a look at the numbers for yourself but you will be observing relative strengths of relationships between party affiliation and demographics that have been described and understood for a long time.
Quick hits:
- The 2 highest correlation coefficients (and, in my estimation, biggest predictors) for Republican affiliation in Omaha are both income (median household income at .7337 and per capita income at .7238)
- The highest correlation coefficient for Democratic affiliation is total black population (.7580)
Noteworthy: While Democrat and Republican party affiliations predictably inverse one another, the Republican correlation is usually stronger than the inverse correlation for Democratic Party affiliation.
Highlight #4: Race, Language, and Poverty
While the relationships between race, language, and poverty are well known (many minority and non-english-speaking populations have higher levels of poverty than white, english-speaking populations) and exist nationwide, the trend is very obvious and, I suspect, particularly strong in the Omaha metro due to the segregated layout of the city and continuing “white flight” phenomenon. Look for a deeper analysis on this in Part IV.
I hope the correlation data paired with my $.25 analysis has enlightened (or at least entertained) you so that you will rejoin me for parts II through IV. A user friendly and feature selection capable version of this analysis is coming soon to Sidewalk Insights.
Resources to learn more
- Full correlation coefficient dataset as a google sheet
- Github repo containing the ongoing and up to date scripts, geographic information, and raw csv version of the correlation data. At present, I haven’t shared the underlying characteristic data that formed the basis for the correlations but may share a subset of it in the near future.
Need help finding all the things that correlate in your community? Chat with us to discover how our tool can help you.
About the Author: Matt Barr works at mySidewalk helping to create the technology that powers understanding communities. Civics, democracy, technology, and the great outdoors are his passions.



