When people think about inequality of incomes, a key issue is inequality of opportunity. Some people are born to rich parents who can afford private schools, summer camp, SAT tutors, etc., while others have poorer parents who cannot easily afford such things. One might wonder how much of the income inequality we observe can be explained by differences in the resources that people get because of varying parental incomes.
Let me suggest a rough calculation that gives an approximate answer.
The recent paper by Chetty et al. finds that the regression of kids’ income rank on parents’ income rank has a coefficient of 0.3. (See Figure 1.) That implies an R2 for the regression of 0.09. In other words, 91 percent of the variance is unexplained by parents’ income.
I would be willing venture a guess, based on adoption studies, that a lot of that 9 percent is genetics rather than environment. That is, talented parents have talented kids partly because of good genes. Conservatively, let’s say half is genetics. That leaves only 4.5 percent of the variance attributed directly to parents’ income.
Now, if you let me play a bit fast and loose with the difference between income and income rank, these numbers suggest the following: If we had some perfect policy invention (such as universal super-duper pre-school) that completely neutralized the effect of parent’s income, we would reduce the variance of kids' income to .955 of what it now is. This implies that the standard deviation of income would fall to 0.977 of what it now is.
The bottom line: Even a highly successful policy intervention that neutralized the effects of differing parental incomes would reduce the gap between rich and poor by only about 2 percent.
This conclusion does not mean such a policy intervention is not worth doing. Evaluating the policy would require a cost-benefit analysis. But the calculations above do suggest that all the money the affluent spend on private schools, etc., explains only a tiny fraction of the income inequality that we observe.
Addendum: A few readers seem confused about how to infer an R2 from a coefficient. The key is that the left and right hand side variables in the regression have the same variance. In this case, the R2 is the square of the coefficient. This conclusion is a standard result for AR(1) models, which is what we have here, as applied to generational data. (Also, a few readers are confused when they look at the paper's Figure 1. The points plotted are not the raw data but binned averages, so you cannot see the R2 in the plot.)