Correlate… ALL THE THINGS III: San Francisco

Source: Unsplash, Michael Hirsch; San Francisco, United States

This is the third post in a 4-part series examining interesting relationships that can be found when you correlate every characteristic/attribute of fine grain (Census Tract) geographies in the mySidewalk database for an urban region and then analyze the interesting ones. Also in the series: definitions, part I, & part II

Scope: San Francisco, CA

If this is your first taste of “Correlate… ALL THE THINGS”, I recommend taking a look at this handy/hilarious/handsomely written reference and definitions post.

Above: (CLICK images to enlarge) I. Population density for San Francisco and surrounding counties. Note the comparatively low density of the rest of the Bay Area. II. Population density for census tracts significantly covered by SF county. Note that the tract covering Presidio is included but Treasure Island and Alcatraz had intersectional coverage so low tracts in the relatively empty Marin Headlands would be included if the tolerance were adjusted.

In this post, we’ll examine the City and County of San Francisco. San Francisco is California’s only consolidated city-county (during my preliminary research, I found the legal peculiarities of this form of local government extremely interesting). It is the 18th most populous city in the United States and a major part of the 11th largest metro. Fascinatingly, San Francisco has the least land area of any county in California (however, 80% of the city’s total area is water). SF’s small size, large population, and phenomenal waterway access are major contributors to its population density (second highest in the nation) and yearly economic growth (highest in the nation by many measures).

Similar to previous issues of this publication, we’re using the county as a mask to intersect and include census tracts (click the links to see the precise geographies). Due to quirks of how the county and relevant census tract geographies are defined by the U.S. Census, this analysis does not include Alcatraz or Treasure Island proper. I asked a former Bay Area resident if this would be a problem and he assured me, “Treasure Island is like a fairground with some shipping containers you can live in and not even Michael Bay would object if we all forgot about The Rock.”

Sure, the script had some problems

Correlating ALL the Things

It bears repeating for part III, we correlate every characteristic to every characteristic so that we can analyze this region without going through the painstaking process of feature selection. Instead, we will brute force this analysis by correlating everything and color coding (strong positive correlations are blue, strong negative correlations are red) the results such that strong correlations and patterns become visually apparent. We also normalize characteristics which are affected by the influences of raw counts of population, housing units, and households to attenuate obvious correlations and (consequently) amplify interesting ones.

Without further delay, let’s have a look…

This is *traditionally* where I insert a celebrity meme, a paragraph about data prep, and an irreverent zoomed out shot of the color-coded correlation sheet so the reader can get a feel for the weight of the information involved. I then pad it with a joke caption to reduce the reader’s alienation (hashtag bloggin’). I am really angling to keep this read short, though.

So, 82,824 correlations, no waiting:

Some Interesting Highlights

Disclaimer: A correlation coefficient reflects only the rate and strength at which 2 or more observed characteristics fluctuate in relation to one another in a finite population; a coefficient with a large magnitude (positive or negative) indicates that in the set being examined, those 2 variables were predictive of one another and it becomes statistically fair to say statements such as, “areas with high X frequently have high/low Y”. Correlation is not cause nor is it demographic cross-tabbing; so while it is statistically fair to say, “areas with large populations of X frequently have large/small populations of Y in San Francisco,” it is invalid to make a statement of the form, “In San Francisco, people who have computer science degrees work in grounds and facilities maintenance.”

Highlight #1: Density (the easy one)

Follow along: You can examine a variable’s relationship to density by following the left-most three columns of the spreadsheet
  • Areas of density have a high occurrence of residents with employment in arts, food, and entertainment. They also have a lower occurrence of residents with employment in health, management, and transportation.
  • The densest tracts in SF have a wide spread of income, reflected by the income to poverty ratio counts (although the population of people with income to poverty ratios 2.0 or higher is lower in denser areas) and GINI income inequality index.
  • Urban areas and density have always attracted people looking for work and SF seems to be no exception. The densest regions have relatively larger populations of those working full-time with income below the poverty level and the unemployed. The densest areas do offer some of the best access to employment (Employment Access Index, Job Density, Retail Job Density), however, this often explains why unemployed and underemployed people seek them out as places to live.
  • Fewer families have opted to live in the densest regions with negative correlation coefficients for age groups under 18 and married households.
  • The densest regions of SF have the largest numbers of housing units built prior to 1939. This is uncommon and may be peculiar to SF’s controversial zoning policies.
  • Populations in dense regions rarely spend money on transit, own cars, or travel very far to work. This is another reason unemployed and underemployed people move to them: they require less startup capital to get to work.

Highlight #2: Racial Integration

Follow along: You can examine racial integration by finding the race totals dataset on the left hand side (~row 232) and scrolling right until you reach the intersection of race totals in the top section (~column FR). This set-up is pictured below.
“Racial Integration Triangle” for San Francisco at the tract level

So, how’s San Francisco doing at racial integration? Poorly. Large White populations are less integrated with minority populations in SF than they were in NYC. Additionally, Asian populations are among the least integrated minority, based on this analysis. Hispanic and Black populations are moderately well integrated with minority populations besides Asian ones. So, these correlations show that tracts are typically made up of either White, Asian, or all other minorities and are residentially integrated with relative infrequency.

Highlight #3: Age

Follow along: You can examine age groups from a simplified, generational view by finding the race totals dataset on the left hand side (~row 240). Due to a quirk in the organization of this data, the “Matures” generation is some distance away (~row 211); “Over 65" is an almost perfect slot in replacement, however, so it’s unnecessary to get worked up over. Other age related analyses bear out at “Householder Age” (~row 291), “Median Income” by age (~row 373), and “Median Age” (~row 399).
  • Older populations may be poorly adapted to dense regions. The median incomes of people 45 to 64 and 65 and over are both lower in dense regions than in sparse regions. The correlation coefficient suggests that regions with higher median incomes for people in those age groups are spending a larger percent of those incomes on transportation (plus commuting farther!).
  • Unsurprisingly, large, older populations tend to live in areas where people have lived in their current housing unit the longest. Large Millennial and Gen X populations settle in areas where people have lived in their housing units the least amount of time. It’s possible older populations in SF have less mobility.
  • Generation Z population is the most predictive of the count receiving food stamps (unsurprising as dependents raise the maximum income qualifications for food stamps).

Other Quick Hits

  • Internet Connectivity: SF is a tech mecca, so it’s worth looking at the internet provider stats. Dense, high income areas are targeted by the most providers. Further, in high competition areas (those with a large number of providers) it is more likely that a larger population of providers will be offering the current best internet technology on the market, fiber.
  • Politics: SF’s political affiliations (~rows 387–389 and ~columns X-Z) are more complex than other regions I’ve analyzed. Large Republican and “Other” affiliations are the only integrated populations (stridency among these Democrats?) and many wealth-based predictors of affiliation (besides, oddly enough, home value) are muddier here than other geographies. Large Black and Hispanic populations still show a correlation with Democratic party affiliation but large Asian populations have an unusually strong positive correlation with “Other” affiliation and an equally strong aversion to Democratic party affiliation. Largely uninsured populations have an affinity for any party but the Republicans.
  • Retail: The easiest shopping (highest Retail Access Index) can be found in places with: lots of employment in STEM / business / finance / management, few people currently enrolled in any kind of schooling besides graduate/professional degrees, lots of transplants (residents from out of state), relatively large non-veteran populations, the absence of large residential populations in generations besides “Millennials”, high walkability (big surprise! walking and window shopping are excellent for commerce), and high property values with a lot of units per structure. Good access to retail is also one of the strongest predictors of access to jobs.

Thanks for joining me (or returning) for Act III of “Correlate… ALL THE THINGS”. As always, I hope you were simultaneously entertained and enlightened.

Resources to learn more

Need help finding all the things that correlate in your community? Chat with us and discover how our tool can help you.

About the Author: Matt Barr works at mySidewalk helping to invent the technology that betters our understanding of communities. Civics, democracy, computing, and the great outdoors are his passions.