This map is stunning. It shows the number of small business loans given in each census tract. While it makes sense that there will be high and low lending activity areas, it is still surprising to see the clear boundary of the city represented by the lending behavior of large institutions.

Average Loan Size by Census Tract

Count of Loans by Census Tract

WOW

So what’s causing this? Is it that suburban businesses are more likely to expand? Are their more businesses in those areas, thus driving the count higher? Are banks more likely to loan simply because a business is located on one side of the street?

This is a great opportunity to call out some statistical tools to describe what is happening. I’d first start out with a naive regression to quantify the problem:

Number of loans = b0 + b1InCity (Binary)

This would simply tell us how many loans, on average, are given (or not given) when the business is in the city. It’s just an average.

Multivariate

Then I’d move to something more exciting to parse out the issue. We would want to control for local income and for the size of the loan. Local income will probably affect whether a loan is given because it has an impact on the success of the business – especially if the business is intended to serve a local market. Since we’re trying to parse out JUST the effect of being in the political boundary while not quantifying the other factors that go along with being in the city, this is something we have to address.

Also we should look at the average loan amount as well. Some business areas, like tech parks, might be a GREAT place to start a business and thus get a loan. It may also be in a low-income area, because who wants to live next to a tech park. Since we don’t want to have our InCity variable affected by whether we’re in a high-business area or not, this is something we need to control for. While we’re at it, we should also parse out whether we’re in a high-density business area. Since it’s a count variable, and so is our number of loans, then we would set ourselves up for weird results if we don’t control for it.

We also want to somehow control for the person who is taking out the loan and their business savvy. This is the classic issue with quantifying ability, but since this is a hypothetical exploration, let’s just pretend we have a perfect measure for entrepreneurial ability. Yay pretending.

That leaves us with:

Number of loans = b0 + b1InCity (Binary) + b2tract_avg_income + b3avg_loan_amount + b4Num_Establishments + b5Entrep_Abil

To be sure, we could also include any other number of factors that affect the desirability of the area. Factors like crime data, response levels for public service, road conditions, access to transportation infrastructure, proximity to markets… these are all factors that also affect the location of businesses and aren’t completely controlled for in the equation above.

The trouble is, they also probably correlate heavily with the city boundary variable. As long as we’re prepared to allow the city variable to represent ALL of the things that go along with being in Detroit – city services vs. suburb services, etc., then the above equation will make some sense in identifying the impact of being in the city on the number of loans in a tract.

Linear Probability Model

Understanding the motivation behind the raw count is a great start, but it’s not the whole picture. To quantify the likelihood of getting a loan as a Detroit business, I would opt for a Linear Probability Model approach. I want to know what percent of loans are rejected, all else equal, solely on the basis that they are located within the city.

For that, we would need to know how many applications there are and how many are rejected. For the rest, we can use the same variables as we used in the multivariate regression above:

Likelihood of loan = b0 + b1InCity (Binary) + b2tract_avg_income + b3avg_loan_amount + 4Entrep_Abil

Although the more I look at that equation, the less I think it’s correct. The variables are reasonable, but the special case of the LPM may not be as appropriate as some other application of a binomial regression that I haven’t considered. That, and I’m sure there is some goofy interaction trick that we could use to elicit some quantifiable evidence of causation.

This would be a very interesting paper to write, though this subject is probably well-covered somewhere and I simply didn’t take the time to read their paper. But hey, What do you want? This is a blog.