So many countries are currently conducting or seriously talking about starting Universal Basic Income (UBI) experiments that it’s becoming hard to keep track. These are not the first experiments in UBI or other forms of Basic Income Guarantee (BIG). Namibia and India conducted UBI experiments in the late 2000s and early 2010s. And between 1968 and 1980, the U.S. and Canadian Government conducted five Negative Income Tax (NIT) experiments. They were the world’s first major social science experiments of any kind. They are worth reviewing because they provide not only inspiration and precedent but also relevant data and important lessons for the current experiments.
I’m working on a book (tentatively titled Basic Income Experiments: The Devil’s in the Caveats) drawing lessons from the ’70s experiments for the current round of experiments. This blog post previews a chapter from that upcoming book providing a review of results from the 1970s experiments. The chapter, in turn, draws heavily on my earlier work on BIG experiments including “A Failure to Communicate: What (if anything) Can We Learn from the Negative Income Tax Experiments” and “A Retrospective on the Negative Income Tax Experiments: Looking Back at the Most Innovative Field Studies in Social Policy.” Next week, I’ll make a blog post showing how poorly understood the NIT experiments were in the media at the time.
Labor market effects
Unfortunately, most of the attention of the 70s experiments was directed not at the effects of the policy (how much does it improve the welfare of low-income people) but to one potential side effect (how does it affect labor hours of test subjects). And so that issue takes up most of the discussion here.
Table 1 summarizes the basic facts of the five NIT experiments. The first, the New Jersey Graduated Work Incentive Experiment (sometimes called the New Jersey-Pennsylvania Negative Income Tax Experiment or simply the New Jersey Experiment), was conducted from 1968 to 1972. The treatment group originally consisted of 1,216 people and dwindled to 983 (due to dropouts) by the conclusion of the experiment. Treatment group recipients received a guaranteed income for three years.
The Rural Income Maintenance Experiment (RIME) was conducted in rural parts of Iowa and North Carolina from 1970 to 1972. It began with 809 people and finished with 729.
The largest NIT experiment was the Seattle/Denver Income Maintenance Experiment (SIME/DIME), which had an experimental group of about 4,800 people in the Seattle and Denver metropolitan areas. The sample included families with at least one dependent and incomes below $11,000 for single-parent families or below $13,000 for two-parent families. The experiment began in 1970 and was originally planned to be completed within six years. Later, researchers obtained approval to extend the experiment for 20 years for a small group of subjects. This would have extended the project into the early 1990s, but it was eventually canceled in 1980, so that a few subjects had a guaranteed income for about nine years, during part of which time they were led to believe they would receive it for 20 years.
The Gary Income Maintenance Experiment was conducted between 1971 and 1974. Subjects were mostly black, single-parent families living in Gary, Indiana. The experimental group received a guaranteed income for three years. It began with a sample size of 1,799 families, which (due to a large drop-out rate) fell to 967 by the end of the experiment.
The Canadian government initiated the Manitoba Basic Annual Income Experiment (Mincome) in 1975 after most of the U.S. experiments were winding down. The sample included 1,300 urban and rural families in Winnipeg and Dauphin, Manitoba with incomes below C$13,000 per year. By the time the data collection was completed in 1978, interest in the guaranteed income was seriously on the wane and the Canadian government canceled the project before the data was analyzed.
Table 1: Summary of the Negative Income Tax Experiments in the U.S. & Canada
|Name||Location(s)||Data collection||Sample size:
|The New Jersey Graduated Work Incentive Experiment (NJ)||New Jersey & Pennsylvania||1968-1972||1,216 (983)||Black, white, and Latino, 2-parent families in urban areas with a male head aged 18-58 and income below 150% of the poverty line.||0.5
|The Rural Income-Maintenance Experiment (RIME)||Iowa & North Carolina||1970-1972||809 (729)||Both 2-parent families and female-headed households in rural areas with income below 150% of poverty line.||0.5
|The Seattle/Denver Income-Maintenance Experiments (SIME/DIME)||Seattle & Denver||1970-1976,
(some to 1980)
|4,800||Black, white, and Latino families with at least one dependant and incomes below $11,00 for single parents, $13,000 for two parent families.||0.75, 1.26, 1.48||0.5
|The Gary, Indiana Experiment (Gary)||Gary, Indiana||1971-1974||1,799 (967)||Black households, primarily female-headed, head 18-58, income below 240% of poverty line.||0.75
|The Manitoba Basic Annual Income Experiment (Mincome)||Winnipeg and Dauphin, Manitoba||1975-1978||1,300||Families with, head younger than 58 and income below $13,000 for a family of four.||C$3,800
* G = the Guarantee level.
** t = the marginal tax rate
Source: Reproduced from Widerquist (2005)
Scholarly and popular media articles on the NIT experiments focused, more than anything else, on the NIT’s “work-effort response”—the comparison of how much the experimental group worked relative to the control group. Table 2 summarizes the findings of several of the studies on the work-effort response to the NIT experiments, showing the difference in hours (the “work reduction”) by the experimental group relative to the control group in foregone hours per year and in percentage terms. Results are reported for three categories of workers, husbands, wives, and “single female heads” (SFH), which meant single mothers. The relative work reduction varied substantially across the five experiments from 0.5% to 9.0% for husbands, which means that the experimental group worked less than the control group by about ½ hour to 4 hours per week, 20 to 130 hours per year, or 1 to 4 fulltime weeks per year. Three studies averaged the results from the four U.S. experiments and found relative work reduction effects in the range of 5% to 7.9%.[i]
The response of wives and single mothers was somewhat larger in terms of hours, and substantially larger in percentage terms because they tended to work fewer hours, to begin with. Wives reduced their work effort by 0% to 27% and single mothers reduced their work effort by 15% to 30%. These percentages correspond to reductions of about 0 to 166 hours per year. The labor market response of wives had a much larger range than the other two groups, but this was usually attributed to the peculiarities of the labor markets in Gary and Winnipeg where particularly small responses were found.
Table 2: Summary of findings of work reduction effect
|Study||Data Source||Work reduction*
in hours per year ** and percent
|Comments and Caveats|
|Robins (1985)||4 U.S.||-89
|Study of studies that does not assess the methodology of the studies but simply combines their estimates. Finds large consistency throughout, and “In no case is there evidence of a massive withdrawal from the labor force.” No assessment of whether the work response is large or small or its effect on cost. Estimates apply to a poverty-line guarantee rate with a marginal tax rate of 50%.|
|Burtless (1986)||4 U.S.||-119
|Average of results of the four US experiments weighted by sample size, except for the SFH estimates, which are a weighted average of the SIME/DIME and Gary results only.|
|Keeley (1981)||4 U.S.||-7.9%||A simple average of the estimates of 16 studies of the four U.S. experiments|
|Robins and West (1980a)||SIME/
|Estimates “labor supply effects.” It goes without saying that this is different from “labor market effects.”|
|Robins and West (1980b)||SIME/
|-9%||-20%||-25%||Recipients take 2.4 years to fully adjust their behavior to the new program.|
|Cain et al (1974)||NJ||–||-50
|–||Includes caveats about the limited duration of the test and the representativeness of the sample. Notes that the evidence shows a smaller effect than nonexperimental studies.|
|Watts et al (1974)||NJ||-1.4% to
|–||–||Depending on size of G and t|
|Rees and Watts (1976)||NJ||-1.5 hpw**
|-0.61%||–||Found anomalous positive effect on hours and earnings of blacks.|
|-27%||–||“There must be serious doubt about the implications of the experimental results for the adoption of any permanent negative income tax program.”|
|Moffitt (1979a)||Gary||-3% to -6%||0%||-26% to -30%||No caveat about missing demand, but careful not to imply the results mean more than they do.|
|Hum and Simpson (1993a)||Mincome||-17
|Smaller response to the Canadian experiment was not surprising because of the make-up of the sample and the treatments offered.|
* The negative signs indicate that the change in work effort is a reduction
** Hours per year except where indicated “hpw,” hours per week.
NJ = New Jersey Graduated Work Incentive Experiment
SIME/DIME = Seattle / Denver Income Maintenance Experiment
Gary = Gary Income Maintenance Experiment
RIME = Rural Income Maintenance Experiment
Mincome = Manitoba Income Maintenance Experiment
SFH = Single Female “head of household.”
Source: Reproduced from Widerquist (2005)
All or most of the figures reported above are raw comparisons between the control and experimental groups: they are not predictions of how labor market participation is likely to change in response to an NIT or UBI. There are many reasons why these figures can’t be taken as predictions of responses to a national program. The many difficulties of relating experimental results to such predictions is a major theme in the book I’m writing. I’ll mention just four of them now.
First, the study participants were drawn only from a small segment of the population: people with incomes near the poverty line, about the point at which people are most likely to work less in response to an income guarantee because the potential grant is high relative to their earned income. Thus, the response of this group is likely to be much larger than the response of the entire workforce to a national program. One study using computer simulations estimated that the work reduction in response to a national program would be only about one-third of reduction in the Gary experiment (1.6% rather than 4.5%).[ii] Although simulations are an important way to connect experimental data with what we really want to know, the more researchers rely on them the less their reports are driving by their experimental data.
Second, the figures do not include any demand response, which economic theory predicts would lead to higher wages and a partial reversal of the work-reduction effect. One study using simulation techniques to estimate the demand response found it to be small.[iii] Another found, “Reduction in labor supply produced by these programs does tend to raise low-skill wages, and this improves transfer efficiency.”[iv] That is, it increases the benefit to recipients from each dollar of public spending.
Third, the figures were reported in average hours per week and very often misinterpreted to imply that 5% to 7.9% of primary breadwinners dropped out of the labor force. The reduction in labor hours was not primarily caused by workers reducing their hours of work each week (as few workers are able to do even if they want to). Moreover, few if any workers simply dropped out of the labor force for the duration of the study, as knee-jerk reactions to guaranteed income proposals often assume.[v] Instead, it was mainly caused by workers taking longer to find their next job if and when they became nonemployed.
Fourth, the experimental group’s “work reduction” was only a relative reduction in comparison to the control group. Although this language is standard for experimental studies, it doesn’t imply that receiving the NIT was the major determinate of labor hours. In fact, in some studies, labor hours increased for both groups, and the labor hours of both groups tended to rise and fall together along with the macroeconomic health of the economy—implying that when more or better jobs were available, both groups took them, but when they were less available, the control group searched harder or accepted less attractive jobs.[vi]
As I’ll show in my next article about the NIT experiments, most laypeople writing about the results assumed any work reduction, no matter how small, to be an extremely negative side effect. But it is not obviously desirable to put unemployed workers in the position where they are desperate to start their next job as soon as possible. It’s obviously bad for the workers and families in that position. It’s not only difficult to go through but also it reduces their ability to command good wages and better working conditions. Increased periods of nonemployment might have a social benefit if they lead to better matches between workers and firms.
The focus of the 1970s experiments on work effort is in one way surprising because presumably, the central goals of a UBI involve its effects on poverty and the wellbeing of relatively low-income people, and assessing these issues requires look at non-labor-market effects.
The experimental results for various quality-of-life indicators were substantial and encouraging. Some studies found significant positive influences in elementary school attendance rates, teacher ratings, and test scores. Some studies found that children in the experimental group stayed in school significantly longer than children in the control group. Some found an increase in adults going on to continuing education. Some of the experiments found desirable effects on many important quality-of-life indicators, including reduced incidents of low-birth-weight babies, increased food consumption, and increased nutritional content of the diet. Some even found reduced domestic abuse and reduced psychiatric emergencies.[vii]
Much of the attention to non-labor market effects focused not on the presumed goals of the policy but on another side effect: a controversial finding that the experimental group in SIME-DIME had a higher divorce rate than the control group. Researchers argued forcefully on both sides with no conclusive resolution in the literature. The finding was not replicated by the Manitoba experiment, which found a lower divorce rate in the experimental group. The higher divorce rate in some studies examining SIME-DIME was widely presented as a negative effect, even though the only explanation for it that researchers on either side were that the NIT must have relieved women from financial dependence on husbands.[viii] It is at the very least questionable to label one spouse staying with another solely because of financial dependence as a “good” thing.
An overall comparison?
Most of the researchers involved considered the results extremely promising overall. Comparisons of the control and experimental group indicated that the NIT was capable of significantly reducing the material effects of poverty, and the relative reductions in labor effort were probably within the affordable range and almost certainly within the sustainable range.
But experiments of this type were not capable of producing a bottom line. Non-specialists examining these results might find themselves asking: What was the cost exactly? How much were the material effects of poverty reduced? What is the verdict from an overall comparison of costs and benefits?
Experiments cannot produce an answer to these questions. Doing so would involve taking positions on controversial normative issues, combining the experimental results with a great deal of nonexperimental data, and plugging it into a computer model estimating the micro- and macroeconomic effects of a national policy. The results of that effort would be driven more by those normative positions, nonexperimental data, and modeling assumptions than by the experimental results that such a report would be designed to illustrate.
Whichever strategy experimental reports take, nonspecialists will have difficulty grasping the complexity of the results and the limits of what they indicate about a possible national policy. No matter how well the experiment is conducted, the results are vulnerable to misunderstanding, misuse, oversimplification, and spin. My blog post next week will show how badly this happened when the results of NIT experiments were reported in the United States in the 1970s.
[i] G. Burtless, “The Work Response to a Guaranteed Income. A Survey of Experimental Evidence,” in Lessons from the Income Maintenance Experiments, ed. A. H. Munnell (Boston: Federal Reserve Bank of Boston, 1986). M.C. Keeley, Labor Supply and Public Policy: A Critical Review (New York: Academic Press, 1981). P.K. Robins, “A Comparison of the Labor Supply Findings from the Four Negative Income Tax Experiments,” Journal of Human Resources 20, no. 4 (1985).
[ii] R.A. Moffitt, “The Labor Supply Response in the Gary Experiment,” ibid.14 (1979).
[iii] D.H. Greenberg, “Some Labor Market Effects of Labor Supply Responses to Transfer Programs,” Social-Economic Planning Sciences 17, no. 4 (1983).
[iv] J.H. Bishop, “The General Equilibrium Impact of Alternative Antipoverty Strategies205-223,” Industrial and Labor Relations Review 32, no. 2 (1979).
[v] Robert Levine et al., “A Retrospective on the Negative Income Tax Experiments: Looking Back at the Most Innovative Field Studies in Social Policy,” in The Ethics and Economics of the Basic Income Guarantee, ed. Karl Widerquist, Michael A. Lewis, and Steven Pressman (Aldershot: Ashgate, 2005).