Bad Macroeconomics and the Current Recession in Three Plots

One of the big lessons of cheap talk models is to exert some degree of skepticism when the incentives of those doing the talk, and often sharing mostly unverifiable opinions, are not fully aligned with ours. Recently, President Biden, Fed Reserve Jerome Powell and the NBER took the historically unprecedented step to refuse to call at two-quarter decline in GDP a “recession.” Reasons given are, well, bizarre, and reminding us well of the market for excuses that Watts and Zimmerman originally described.

“It’s just supply shocks,” as if decline in activity did not hurt the economy if these are due to exogenous causes. Under this definition, the first oil shocks of the 70s would not qualify as a recession, nor any of the post-war shocks after the economy returned to normalcy.
“it’s temporary,” as if the recessions had to be permanent to matter and there was a well-accepted definition of how temporary a shock needs to be.
“it’s ok if unemployment is low;” so, to be clear, the theory is that a recession must demonstrate a labor market inefficiency, and, if, say, willingness to work unexpectedly decreased to the point of dramatically reducing output, this is NOT a recession.

The response is very much like a person cheating on their spouse and explaining it shouldn’t be called cheating in the first place because (1) it’s just a shock to the supply of attractive partner, (2) it’s temporary until the fling passes, and (3) it’s ok because that’s what they wanted to do in the first place.

To be sure, all current players have own specific agendas. Biden wants to keep confidence to complete his political promises, Powell is an ex-investment banker trying to keep to the covid stock market and real estate bubbles alive, and the NBER is paranoid about creating a self-fulfilling recession if there is any remote chance it might be avoided. Perhaps, among the three, the NBER response is the most disappointing, as its leadership is no longer acting as a pure disinterested research institute telling us about average events based on prior established science but making new calls about the societal consequences of its findings and the word it uses, without the rigorous process of academic documentation.

An important disclaimer: I am not a macroeconomist, and my perspective has been greatly influenced by taking the neo-classical macro courses taught by Finn Kydland and Ben McCallum at Carnegie Mellon as a grad student. They are among the most scholarly individuals I have ever met, and think about the science as its own goal, not as means for persuasion. Macro is one of the most complex subsets of economics that I have seen, crossing micro, aggregation, complex systems, and all of this with limited correlated data that require solid models to fill the gaps. So, my objective here is not to make any sort of deep analysis. However, what these scholars also teach is that macro is also a matter of common-sense approach to concrete facts and behaviors, jointly informed by data and theory. To this effect, I will share three plots of fundamental economic quantities taught by these economists that conflict with the current spin of policy makers. Let me also point to the excellent blog by NYU’s Nouriel Rubini’s blog, who, unlike me, is a legit macroeconomist, and has been pointing to many of these worrisome aspects.

The first plot below is the monetary aggregates M1 and M2, and the CPI inflation (from the St Louis Fed). For reference, M1 is mainly money plus checking and savings accounts, while M2 includes money markets (short-term debt) that are almost equivalent but not immediately tradable. The CPI inflation is for all consumer goods in the US cities, including energy.

A link between M1 and M2 is expressed by the Friedman money equation which states, below, that:

M x V = P x Y

and is interpreted as money velocity (M) times volume of money (V) being equal to price (P) times real output (Y).

At its core, the equation is an accounting tautology given that velocity V is simply the amount of transactions P Y divided by the amount of money (for example, V is one is all money circulates once). However, behind this simple principle, it helps organize the Friedman monetarist doctrine. We shouldn’t have any reason to expect long term changes to propensities to keep idle cash balances (V should stay constant), and it makes sense to have a rate of inflation low to avoid productive disruptions (P should stay approximately constant or grow slowly). Hence, M should grow at the rate of output Y, say, about 3% a year.

A unique aspect of the Covid pandemic is that V abruptly decreased because most consumers could not engage in normal purchases, implying abnormally low (temporary) decrease in inflation. At the same time, this was offset by an enormous increase in money supply (see plot above), which avoided deflation. Post Covid, money velocity will be returning to its pre-Covid levels so, from Friedman’s equation, we can simply infer the rate of cumulative implied inflation as the amount of growth in M1 and M2. Relative to January 2020, M1 grew by a whopping 414% and M2 grew by 41%. Using the more aggressive estimate of M2, we shouldn’t be expecting just 10% inflation over a few months, but sustained inflation over multiple years.

Unless of course, the Fed removes from circulation the liquidity. But note that this is not done via interest policy but by open market repurchases, which the Fed has repeatedly delayed and, to be honest with ourselves, would create a major economic breakdown as there would be no one else to buy short-term debt. Repurchasing such large amounts of debt is just not a feasible policy in any near future unless we see measurable productivity gains.

By way of an anecdote, I was giving a talk in a Central Bank earlier this year, and asked the economists why they were still predicting low levels of inflation despite the fact that money supply had exploded to levels that would justify a cumulative abnormal rise in inflation between 30% and 40% (or even more under M1). They pointed out that they did not believe in the Friedman model, because the Great recession of 2008 showed enormous increases in the money supply with no effect on inflation. In fact, they also admitted that central banks no longer used economic models with money being explicitly modelled (as per the models taught by McCallum) but instead had a real economy with inflation coming up from an exogenous relationship with unemployment and output gap. This model is useful to understand the real economy, but is it an appropriate model when the main concern is predicting inflation? It isn’t, nor it is being used for that purpose. The current inflation predictions we see being circulated are primarily driven by statistics, not economic theory. Statistics do not work for unique events for which past data may not be relevant and one has to understand the core economic equations of a problem.


The second plot is about the asset bubble that is currently under way. During covid, asset markets exploded despite widely documented decrease in productivity (which we all have experienced) during the stay-at-home mandate. It is possible that a year or two of productivity may have been lost, and possibly more if many workers have gotten used to a less effective type of work. Wall Street never corrects for inflation so all earnings per share earnings growth numbers include inflation. Many firms are reporting spectacular near two-digit growth in earnings.

Let us note that the model presented here is right: in my financial statements analysis course, I ask students to use nominal cost of capital rates to discount future cash flows, without any correction for inflation, so the cash flows should be apples to apples without any adjustment for inflation. So, the high earnings growth is good news, no? Besides, aren’t real assets a cover for inflation? But that’s not true empirically, as asset markets have done worse during periods of high inflation (especially throughout the seventies). The problem is that real assets indeed do very well during periods of unexpected short-term inflation because the cost of capital does not immediately adjust to higher inflation. However, if inflation is long-term, as the plot suggests, the risk-free rate will increase by approximately the expected additional inflation to cover the shortfall.

The plot above is a calculation value using the Gordon growth formula. The assumption is that there is an asset paying $1 in the first year growing at 3% per year, and whose value is calculated by discounting at a risk premium of 5% plus the risk-free rate. The risk-free rate is then varied from zero to 20%. Conservatively, the long-term interest rate was about 3% until the recent period (much lower in the EU) and may be about 4% higher if the 30-40% cumulative inflation is spread out over a long horizon. In principle, the net effect of an increase in interest rate exactly matching the additional inflation is zero, so asset markets should not respond to increase in earnings due to inflation (although the market capitalization should increase by the rate of inflation). Except that markets have consistently responded positively to earnings news driven solely by inflation!

So a more direct calculation is to evaluate the effect of the market realizing that the increase in earnings will be matched by an offsetting increase in interest rates – something that is not fully reflected in prices at the moment. Comparing 3% to 7% implies a 40% readjustment in the value of assets – more or less the difference between peak post Covid markets and pre-Covid levels. The market is thus quite far from pricing the consequences of inflation on interest rates.

The last plot is perhaps the most important. It has been repeatedly stated that unemployment is low, which in turn is incompatible with a recession. Both neo-classical and neo-Keynesian models have reasons for changes in unemployment. In the former, these can be caused by shocks to total factor productivity and, in the latter, there are many possible shocks including shocks to relatively disutility of labor. If unemployment is low, then this should mean that employment is high, no? Well, not exactly, as the plot below shows.

Labor market participation is lower than what it was in January 2020. The low unemployment rate is due to phenomena we have known as the Great Resignation or changes in working habits. To compare apples to apples, consider the following calculation. From 2010 to 2020, labor market participation grew by 7.3%, which implies that it should have been (1+0.073)^(1.5/10) higher than Jan 2020 under normal conditions in July 2022, this amounts to 166.3 million, versus 164 million currently, so about 2.3 million people are not being counted as unemployed because they no longer officially declare themselves as unemployed (whether they are looking for jobs is a different question). The unemployment rate is currently 3.5%, but if we include this labor force, by adding 2.3 to the current total of 5.9 million unemployed, we obtain a corrected unemployment rate of (2.3+5.9)/166227=4.9%, which is larger than historical averages and well within accepted ranges for the start of a recession.

How Casey Broke Roe v. Wade

The Supreme Court is now populated by six out of nine judges chosen by Republican Presidents. To the non-surprise of anyone watching politics, the Court upheld Mississippi legislation banning abortion 6-3 and overturned Roe v. Wade at 5-4 with dissent by Chief Justice John Roberts, with Judge Elena Kagan, voting against, noting that it “eliminates a 50-year-old constitutional right that safeguards women’s freedom and equal station” and Judge Samuel Alito, voting in favor, advocating to “return the issue of abortion to the people’s elected representatives.”

The Supreme Court is now populated by six out of nine judges chosen by Republican Presidents. To the non-surprise of anyone watching politics, the Court upheld Mississippi legislation banning abortion 6-3 and overturned Roe v. Wade at 5-4 with dissent by Chief Justice John Roberts, with Judge Elena Kagan, voting against, noting that it “eliminates a 50-year-old constitutional right that safeguards women’s freedom and equal station” and Judge Samuel Alito, voting in favor, advocating to “return the issue of abortion to the people’s elected representatives.”

On average, the U.S. population generally supports legal abortions. According to a Gallup poll, about 50% supports abortion rights under certain circumstances; however, only 35% support abortion rights legal under any circumstance, and a small minority of 15% opposes abortion under any circumstance. According to a AP-NORC poll, 61% of Americans believe that abortion should be legal in the first trimester, but only 34% support abortion rights in the second trimester and 19% in the last trimester. In other words, the U.S. is far more moderate on abortion rights than public debates seem to suggest.

Consider the comparative experience of Europe, which obviously shares many cultural and political traits, with the example of France as a relatively representative country. France legalized abortion in 1975 by law voted in Congress. Abortion is currently legal until the 14th week of pregnancy. There are some groups favoring a more restrictive legislation but it is not an important political topic. Except for Sweden, at 18 weeks, and the Netherlands and the United Kingdom, at 24 weeks, all other European countries set the maximal terms at 14 weeks or less, with the median at 12 weeks.

Understanding the role of SCOTUS requires some discussion about what a judicial process is and isn’t. Judges make statements about law, by relying on deductive analysis from past judgment (stare decisis) as well as legal principles. That a choice may be broadly favored by the people is a consideration, but it is neither a replacement nor even the core of an opinion. Often, the principles are incomplete to resolve the legal question, so judges will search for a generalizable principle to deductively resolve the question, in the spirit of a mathematician adding an axiom to resolve an undecidable proposition.

Roe v. Wade affirmed the right to an abortion based on the due process clause of the Constitution (14th amendment), which prevents an infringement on the right to privacy unless a sufficient interest of the State can be shown. In the decision, the justices also affirmed that there was no unconditional right to an abortion, noting in the majority opinion:

“The privacy right involved, therefore, cannot be said to be absolute. In fact, it is not clear to us that the claim asserted by some amici that one has an unlimited right to do with one’s body as one pleases bears a close relationship to the right of privacy previously articulated in the Court’s decisions. The Court has refused to recognize an unlimited right of this kind in the past. Jacobson v. Massachusetts, 197 U.S. 1125 S.Ct. 358, 49 L.Ed. 643 (1905) (vaccination); Buck v. Bell, 274 U.S. 20047 S.Ct. 584, 71 L.Ed. 1000 (1927) (sterilization).”

The question of term limits was put back into question with Casey v. Planned Parenthood, a decision that was likely as important as Roe. To put it into context, Casey was decided given the concern that an overly broad interpretation of the State interest might put obstacles on abortions, including extremely restrictive term limits. Yet, what SCOTUS cannot easily do is to argue for an ad-hoc limit, say of a trimester, because such a limit cannot be logically derived from principles. It was left with two options. One, let the State decide on the scope of its Interest, which seems an open door to practically over-rule Roe v. Wade, or, two, determine a new principle from which reasonable restrictions can be derived. The court chose the latter option, affirming a constitutional protection of abortion rights before “fetus viability,” which is generally accepted as being two trimesters. The decision passed 5-4, in a court where all but one justice had been appointed by republican presidents. For comparison, Roe v. Wade passed 7-2, in a court with also a single democratic appointee (justice Byron White) who was one of the two dissenting opinions.

Casey made the position almost untenable, because a two-trimester rule does not represent the opinions of the population (see AP-NORC poll), nor does it remotely approximate laws in countries where abortions are well-accepted. This came to be because the court is unable, by design, to incorporate soft information, such as people’s feelings about the question into its primary argument. At 24 weeks, the fetus is about 30cm (photo) and there is about a chance of survival with proper care. Allowing abortions at 23 weeks is a position that is unacceptable to any pro-life persons and as repellent as would be banning abortions altogether to a pro-choice person. John Roberts, who voted against striking down Roe, says referring to the viability rule and international precedent:

“In short , the viability rule was created outside of the ordinary course of litigation, is and always has been completely unreasoned, and fails to take account of the state interests since recognized as legitimate. It is indeed “telling the other countries almost uniformly eschew” a viability line.”

“Only a handful of countries (…) permit elective abortions after twenty weeks; the rest have coalesced around a 12-week line. See The World’s Abortion laws, Center for Reproductive Rights (Feb. 23, 2021)”

Prior to the reversal, the U.S. had one of the most pro-choice legislations in the world, despite a significantly more religious population. Just after the reversal, (1) 10 states ban abortions, (2) only two states allow abortions within a window of 6 weeks or 15 weeks, and (3) 7 states allow abortions without any gestational limit. All remaining states allow abortions beyond 20 weeks. It is no surprise that, under these extremes, there is no hope for common ground or even dialogue.

The problem should be rephrased not as one of demonization of abortion or its opponents, and I bring here everyone to re-reading Jonathan Haidt always insightful The Righteous Mind: Why Good People are Divided by Politics and Religion (see my review here), but in terms of a mutually acceptable law that reflects both citizens rights to take responsibility for their ethical choices, with some partial reflection in the law. There is helpful precedent from other countries that have persuaded their population to come to a reasonable common ground respectful of all sides.

Returning back to SCOTUS, perhaps the problem is one of strict majority vote. With a qualified majority of 6, Roe would have passed in 1973 and not have been reversed in 2022; the first trimester in the Mississippi legislation, which is consistent with the framework in Roe, would have been upheld; and Casey would not have been decided based on viability. Furthermore, a qualified majority is in line with the principle of stare decisis, since it requires more votes to change a status-quo.

Is America drowning in Bureaucracy?

In the near 20 years I have been in the U.S., I experienced a country where people on all sides of the political spectrum believe in the value of enterprise, including the many cases in which some degree of intervention is needed to restore fairness and efficiency in the marketplace. However, there has been a new pernicious trend that has been changing everyone’s way of life for the worst: the rising trend toward a powerful bureaucracy which favors inaction and status quo, with often terrible consequences.

The May 24th shooting in Uvalde is an extreme example. For those who have been living under a rock, 19 children and two teachers lost their lives, while 17 others were injured. Shootings are, sadly, a common and avoidable drama, and speaking to the human suffering is just too hard and not something that this post can do. What is unique to Uvalde is that 19 police officers were on site, and it took them an extra 50 minutes to intervene and stop the shooting. This wasn’t the fault of the rank-and-file, but of a bureaucrat who exerted power to prevent action until the arrival of a specialized SWAT team. Most of the lives would have been saved if the police had been allowed to intervene.

If we can see the power of bureaucrat for life-and-death decision-making, where it will generate clear public outrage, we can only imagine the power of the bureaucracy on lesser actions that are much harder to monitor but will grind decision-making to a halt. It may no longer be a stretch to say that, except for a few companies with hands-on leadership, most institutions are now unable to implement any action that is not routine.

In an excellent paper presented at the Harvard IMO conference, researchers document the emergence of RegTech, defined as bureaucratic processes that impose controls at every step of the organization. These software solutions are expensive to implement, but this is likely the smallest cost. They are mostly pre-programmed or resetting them would require technical knowledge that does not exist in most organizations, and offer almost no ex-post flexibility to the needs of customers or to product innovation. Regtech is the ultimate bureaucrat who is not a person but a piece of code denying approvals and transactions based on automated processes. As a result of RegTech’s adoption in the financial advisory domain, many firms have transformed into client seeking, outsourcing the process of investing (which they are paid to do and their customers believe they are actually doing) to a few large institutions who do the investments, only because the constraints of RegTech are too large. Financial advisors have become salesmen over generic and expensive investment products.

a

This does pose the question: What is a bureaucrat? It will be self-concluding to define a bureaucrat as an ineffective process that prevents good decision-making. A bureaucrat should be defined in terms of characteristics of a process and, in this, I take great inspiration but, in the end, largely disagree, with the work of Prendergast on this question, which I nevertheless recommend to anyone who feels interested in these issues. Like any economist, Prendergast’s approach is that it’s all about incentives, with the bureaucrat being insulated from complaints. All of this is true of course, but most organization do not use strict incentive schemes, even for positions that have nothing to with bureaucrats. And, the description becomes obscure when we realize that implicit incentive schemes, including relative to promotion, are universal even for bureaucracies. So, the philosophical concept of incentives is just too vague of a qualitative criterion to identify and fix a dysfunctional bureaucracy.

I will give below four tentative necessary conditions, which I hope could also be jointly sufficient.

1/ Unilateral power to preserve the status-quo and block decisions, that cannot be easily or practically reversed over the horizon the decision. In Uvalde, the commander officer could stop any police officer from intervening and even an elected politician would not have been able to resolve this in the time horizon of the intervention;

2/ Has no direct stake in the costs and the benefits of the decision they are making, or, put differently, is delegated the task to act on behalf of another without sharing the pains or benefits. Prendergast would call this a delegation problem and I agree this is an important characteristic, but not the only one.

3/ Has no official decision rights to change or deviate from procedures, but is expected to interpret existing procedures in the manner that most favors the status-quo with more severe consequences when violating a procedure versus giving an overly broad interpretation. One should send in the police if there is a sufficient certainty of an active shooting, so a deviation from the status-quo will be interpreted near-certainty that a shooting is taking place, even if the rule does not literally say so. Any incompleteness in the rule will be interpreted as forbidden explicitly explicitly noted and allowed (which is in contrast to the law).

4/ Sets no deadlines on itself, and has no workflow requirement explaining when a particular task must be complete, when feedback on the process is due, although, occasionally, some non-binding guidance may be given

The problem of this in the U.S. is that, unlike other countries, like my home country of France, Americans are very respectful of rules. This is clearly a good thing: institutions work well because people follow the rules. But, when there is a bureaucracy in place and bad or inappropriate rules, the usual response in Europe would be to favor good decisions and provide an exception to the rule. For example, the process of approving Covid vaccine was much, much quicker in most of Europe, where it was a non-issue, than in the U.S., where only emergency approval was initially given (creating even more confusion as to whether the vaccine was even safe), being only approved in August 23rd 2021.

We need a change and all of us to take joint responsibility of the terrible cost of bureaucracy. In doing so, let us be mindful that because of 3/, the individuals implementing the needs of the bureaucracy are not those that are to blame. No, we should fighting against bureaucrats, which is cheap, counter-productive and can be callous to the hard job of verifying procedures. We should, however, make the decision-maker that put in place these bureaucratic systems accountable for the costs. These are not bureaucrats but actual decision-makers. Often, they are uninformed about the cost because of the lack of feedback. So, it is on us to provide, in a professional manner, the right amount of feedback so that we keep the bureaucracy minimal, where it is needed with clear documented evidence of its benefits dominate its many costs.

There is no simple solution to change this trend, except that of giving more feedback to unnecessary and likely detrimental bureaucratic controls. But there are small actions that do make a difference. Bureaucrats are not the problem, since they do a job that they have been asked to do. The problem is the higher-level decision-making process which decides to put in place these contros. This decision-making bodies, not the bureaucrats, are those that have control. They need to explain with quantitative facts and cost-benefit analyses the places where this is necessary.

Does size matter for the impact of an area?

Measuring impact by the number of citations has become a de facto standard in measuring quality in academic research. Promotion committees, journals or even google searches will pick up on the most cited papers. In this entry, I would like to clarify a common misunderstanding in citation count:

Do smaller areas have lower citation count?

This is an argument that I’ve often heard, and combined with an assumption that areas of different size might all have comparable social value, it implies that the value of each cite should be scaled by the size of an area. To simplify, if area A is twice the size of area B, perhaps authors in area A will receive twice the cites than authors of area A, and each A cite should be counted as half of a B cite.

Looks plausible, after all larger areas in the sciences have the highest number of cites.. except that the logic behind this conclusion is wrong. So, let us do the calculation correctly, first with the simplest possible set of assumptions.

Suppose, for now, that (1) each area only cites itself, (2) a paper in an area cites k other papers, and (3) there are n papers in the area. The total number of cites in the area is k times n, which is increasing in the size of the area. But the number of cites per paper is k  times n divided by n (number of papers) which is equal to k! This is a mathematical tautology: the average number of cites received by each paper is equal to the average number of papers cited by each paper (and has nothing to do with size).

Once we have accepted this general equilibrium equality of incoming and outgoing cites, we can make some progress in understanding how to improve our use of cites. k is plausibly an area convention that is not causally connected to quality. For example, even if an area wasn’t making any scientific discoveries and was non-productive but used a high k, then its cite count would be high regardless of its actual quality. So, while I’ve seen this occasionally in committees, we can’t compare across disconnected areas (i.e., that do not cite each other).

Things get more, not less, confusing once we start comparing across connected areas. Relaxing our assumption (1), for equal number of outgoing cites k in each area, the area whose net flow of cites across areas is positive will have a higher cite count (above k) per paper. Suppose we concluded that the more cited area is, hence, more important and should be expanded, while the other area should shrink. Immediately, as the more cited area would grow, its relative cite advantage would shrink and decrease down to k if the less cited area disappeared. So, the more important area is only important if we accept incoming cites from the less cited area as also important.

I assumed so far that k is identical across areas, but this is at odds with the reality of academic practice. Many areas differ in terms of social conventions about k. In more technical areas, for example, many factors tend to keep k low:

  • Authors cite sparingly only the narrowly defined research that speaks directly to the results – if a paper is on the same topic but answers a different question or uses a very different type of approach, it may not be cited. The criterion for citation would be “does this other study help create or understand my research design, or does it illuminate the nature of my results?”
  • Technical research tends to build on frameworks and would cite a key paper, often an older paper, but skip on  other recent papers using the framework for a different question. This causes the effective k (cites to recent work) to be even smaller because older papers do not count for measures of impact factor and, furthermore, in the domain of promotions, older papers are written by senior faculty that are no longer being reviewed. This causes k to be even smaller for junior faculty starting in a technical area.
  • Authors in an applied area may prefer to cite a key paper from a different older and more established area, versus a more recent reference, especially if it assumed that work in the area is derivative of more foundational work.

While one may disagree with trends toward greater weight on citations, they are the reality of academic performance evaluation, so areas with citation conventions that are very different from the norm can create dramatic confusion. To address the points above, let me recommend a few good practices for those areas or sub-areas: read practices in all related areas that publish in connected journals and evaluate k. If k is too low, consider changing citation practices to fit broader social norms including (i) citing recent research on related (but not distant) topics, (ii) do not assume that an old reference is a substitute for recent work and do not only reference work done by senior authors, (iii) generously cite toward another area if this other area finds a use for the research, as evidenced by incoming citations from the other area, (iv) cite research in the area that has two-way connections, that is, informs the study and whose follow-ups may be informed by the results.

Let me conclude with a related problem of interest that is, unfortunately, poorly understood when understanding the impact of a paper. Not all cites are equal and there may be very little information in a raw cite count that does not examine how a paper is cited. A citation could be a paper in a similar area, cited as part of an introduction with other papers; or it can be (at the other extreme) that the paper engineers a new approach that takes in prior discoveries. Below is an ordinal criterion to rank the impact of a cite, from most impactful to least impactful.

  1. Design of the paper, including research question, was uniquely and thoroughly affected by the cited paper.
  2. Same as 1/, except that the discussion includes one or more other references that were also critical at affecting the research design.
  3. The citation refers to research methods that improved the analysis, but were not essential to the research question.
  4. The research refers to a study on the same topic but which has no direct consequence on the validity or interpretation of the results
  5. The cite refers to minor research design choices that could have been used even if the cited paper did not exist.
  6. The cite refers to work on a different topic, of potential interest to reader.

In practice, it may be difficult to evaluate cites based on what are essentially subjective assessment of the link between citing paper and cited paper; yet, I suspect that there is a very, very strong overlap between the verifiable formatting of the cite and the ranking 1-6, such as, for example: is the cite in text or in a footnote, is it stated as a bulk cite with other papers or discussed in separate sentence(s)?

Let’s conclude with yet another myth. Cites are not an objective measure – the counting process is objective but it is an aggregation of subjective references, many of which are extremely noisy (see above). Summing over very noisy and not well-understood signals and using this information without consideration of the arithmetics of k is a problematic trend in academic research. No count replaces the joy of reading a good paper.

Action: The True Meaning of Theory

As of end of March 2020, it is dawning on us that we live in troubled times. One random mutation in a a single strand of DNA is now rampaging through the world, posing challenges health systems, disrupting entire economies and changing our way of life. But, looking ahead, this is also an alarm bell to reflect on the paradigms of contemplative science that have led us where we are.

By now, about 1 billion of people have been instructed to stay-at-home and minimize any non-essential trips. The rest of the world is implementing social distancing norms that are changing, by any means not for the best, human interaction and the closure of many means of transportation is grounding industry.  Why didn’t we better prepare? Why didn’t we act earlier when the problem was still manageable? Why did we wait for data about cases to accumulate to take action?

Let’s seriously ask ourselves how we got there: over the last decades, we’ve seen a call to arms in the social sciences to rely on evidence-based science. This paradigm of science has a very well-defined mission statement: in the manner of double-blind medical experiments, answer any question by, first, collecting data and, second, applying statistical models as a sole means of intellectual rigor. But statistics require the existence of extensive databases of past repeatable evidence and ideal conditions to identify remedies. Unfortunately, in most crises, we have very little of such ideal circumstances to rely on: global pandemics are few, with the recent ones (SARS, MERS, Swine Flu, Ebola) being kept under control. We cannot observe past evidence of a remedy because we do not have one yet.  Evidence-based statistical research, because of the tight intellectual prison where it lives, presents its argument that science should provide no help for policy-making because evidence can only be collected after the fact.

Except that we knew very, very well the theory of dynamics of epidemics in models of disease with high infection rates (check this excellent simulation video here). The current growth in cases follows a textbook exponential trajectory that, unchecked, can be easily recovered from a very simple equation. Instead, we waited for evidence to arrive before taking action; in fact, we waited for the current situation to occur before even thinking about the fragility of healthcare systems and its resources, and preventative infrastructures to avoid further spread.

cases

We are now paying the price of the just-a-theory movement which prescribes complete skepticism to theory absent testable statistical facts and, in the meantime, sets a path of inaction. This is not a frame of mind that policy makers can afford to have.

For most serious policy questions, we will not have the luxury of a statistical record of history repeating itself to draw upon, and experimentation by trial will be either infeasible or prohibitely expensive. Only generalizable theory can help us make decisions.  In fact, nearly all important actions of our times, from war and peace, forms of government or macroeconomic policies were taken based on theories with sound assumptions and logic, rather than statistics. Theory comes first, action comes after, and statistical evidence may or may not come, usually after the smoke has settled.

The entire debate on credibility of theory is a deep philosophical misunderstanding of what doing theory is all about. There is a place for contemplative theory but it is always a means to an end to devise theory for action in places where we cannot experiment. To be trivial, we cannot build a bridge by simple experimentation – we could imagine trying any possible configuration, do each a thousand times to acquire statistical significance and, then, pick the best design. We can’t solve a pandemic by waiting for a significant portion of the population to be infected. We can’t solve global warming by waiting for a 5% statistical confidence that temperatures have risen. We can’t repair an economic system if we have to wait for existing structures to implode.

Theory is not about ignoring empirical evidence: it is about using evidence of many sorts and origins that cannot be organized in terms of a statistical model. Experience, multiple types of data, common-sense priors (generally informed by shared experiences), generalizations from related settings are all empirical facts that theory uses to provide solutions to practical problems.  In fact, theory is generally more empirically-minded than traditional statistics because, while the latter must simplify the statistical model to objects that can be organized as data points, theory is well-suited to synthetize many objective facts of the real world.

This is very, very different from evidence-based statistical social science which seeks motivation from theory but, ultimately, relies on computing statistical objects on a well-organized dataset. Certainly, evidence-based research is useful when practical, but it requires conditions that do not exist for fundamental problems that need action. When faced with a new challenge, or a challenge for which we have not reached the countless repetitions of the same setting required for statistical analysis, we need to start acting like a scientist and, instead of reclassifying the problem as outside of science, think about our role to guide policy makers choice (see Debraj Ray‘s excellent blog about these).

So how do we use theory now and how does it matter? With so little evidence to work with, everyone is relying on raw priors. Priors uneducated by reasoning need not be entirely wrong, but they’re a set of disconnected assumptions about outcomes than the need be consistent or disciplined by each other. These priors can be extreme: one third of the world views the problem as solved, one third as an extremely serious Spanish-flu event and one third is not taking it seriously but a just-a-flu. At the same time, we have very good priors about the medical risks, the rate of seriousness for each age groups and the infection rates in different contexts. So, a very simple theory could help us do the following:

1/ How many beds/respirators are being put in commission and what is the aggregate capacity of treatment that we have relative to infection rates and their growth? How many days of capacity (at current growth) would we have if no measure had been taken, versus under the current system with (plausibly) very limited additional transmission in high risk zones or contexts?

2/ What is the threshold at which we should shut down an activity or quarantine a certain risk? Should we shut down large meetings (obviously); should we shut down public transport (again, that one is obvious); should we monitor public places like beaches and parks? (very likely) But, how about places with non-zero but much, much smaller transmission risks, like a school where the same people attend all the time, restaurants or stores with safe distances? Only a very basic model would predict a target infection of exactly zero: rather, a good model would predict to keep infection well within the capacity, under even pessimistic scenarios  obtained in 1/.

3/ What is the social cost of blocking the economic activity of entire sectors of the economy? Macro economists calculate welfare – so what is the welfare cost per day per person of quarantine? If we ideally compare apples to apples, what is the effect of welfare spendings on supporting healthcare (like the widely accepted notion that universal healthcare would improve health outcomes – after all, the US lags behind in terms of health outcomes). How are we in terms of comparing marginal dollar of welfare lost due to quarantine vs marginal dollar spent on healthcare? We know the latter from many studies, but for the former we need to rely on theory and models.

4/ In just a week, Congress signed up spendings for over 2 trillion dollars. Let’s first comment on how amazing of a job Congress did to coordinate so many people to agree fast – and, further, the spending bill itself is a compromise deal but not a mess. Yet, no one is really thinking straight about what government spending is because there isn’t wide understanding about what real vs. monetary models are. So, no, the government is not creating 2 billion dollars worth of resource for people to use. Consider for example a thought experiment in which the government were to double every cash balance? Would giving people ‘free money’ really change anything or just scale prices and economic activity?

5/ The government is increasing the amount of money in circulation, part directly (send a check to individuals) and part as a form of employment subsidy. Those who have money – mainly, those owning government debt like pension funds (pre-retirees) – will have their own money being diluted as they were a shareholder in a company gaving away shares for free. And, if this is supposed to be paid off in the future, how does it play out with Ricardian Equivalence (given that rational investors would readjust their choices completely to future taxes) and, more generally, how big of a tax increase are we to expect in future years? If it’s inflation, wouldn’t the spending lead to products demanded by less well-off groups increase the most (supply and demand economics), implying more inflation for necessities and, therefore, decrease in purchasing power for even those with some  inflation protection; what about those who have no job and are not inflation protected?

All of these questions are telling us that we need to rely on past data AND make use of sensible models to advocate for precautionary action before crises get worse. Although it is clear we will have the medical part of the current crisis under control, I am worried that the economic part isn’t, because of the wait-and-see attitude to economic consequences. Instead, theory should prescribe clear continuous action to tell us what data we need and how we use it.

Reviewing Theory in Accounting

I’ve delayed this long enough, and decided I should say a word about the situation in accounting where meaningless reviews destroy incentives to innovate and send to the garbage bin often years of hard work by authors.  So this is a question: How should you review a theory paper?

I believe I have credibility in answering this question. I received an award from one of the accounting journals. I got this award not for quantity (I don’t review that much for this journal) and not for rejecting papers. I got it for accepting (good) papers and, instead of throwing a paper away and moving on with my own research, putting a lot of work to understand the mindset of a paper and helping out. I started doing this because I got fed up with evil behaviors as a struggling posdoc in which competitive researchers play a zero-sum game and ignore the damage of dishing out rejections that are not commensurate with the quality in a Journal.

Ok, so here is the short answer: if you write a review, read at least two recent issues of the Journal – all papers, all methods – read them carefully and learn to appreciate questions and approaches. Do not criticize, keep a open attitude and assume these are all useful contributions. Now, set the bar for accepting within the bottom tercile of the in this set, across methods. Recognize it’s hard to obtain perfect or even deep answers regardless of the method used and benchmark a reasonable expectation to this. Once you have done this, think if, with sufficient work by you and the reviewer, the paper has either met this target or may meet it after two revisions. If the answer is yes, then do not reject the paper. When in doubt, always leave the door open for editors to make the final call.

Few people talk about this but let me next offer reasons that should NEVER, EVER show up as a primary motive for rejection. Then, I’ll give the three primary reasons why you may, sometimes, err toward rejection. Caveats: (1) only for top journals, (2) are primary but there could be secondary comments that were neither necessary nor sufficient (but could be useful for revising), (3) even these reasons should be applied with extra caution, maybe not at all, for unusual papers that step out of common paradigms, (4) always apply with caution: when in doubt, give the authors a chance.

Four reasons not to reject a paper

  1. Not realistic. The word realistic should never show up as a referee report as a motive for rejection. Stylized models are meant to be unrealistic because their point is to refine a trade-off by assuming away aspects that have nothing to do with the trade-off. It’s very, very rare that something assumed away would completely subsume the trade-off and, then, this should be shown formally. A realistic model is a bad model, because it would necessarily fail at providing a clean argument. If you don’t believe me, check the most successful theory papers in accounting, from Verrecchia (1983), Dye (1985) to Kanodia and Lee (1998) or Fischer and Verrecchia (2000), among many, many others. Are these models remotely realistic? In fact, quite the opposite, ask authors to be realistic and you will get ugly models with lots of things going on and unclear trade-offs.
  2. Math too hard, model too general. Lazy reviewers will sometimes reject the paper because they don’t feel they want to check that the appendix is correct or there may be an error somewhere. A version of that is results that are surprising but require a lot of analysis to get there.  This forces authors to restrict their papers to closed-form binary/uniform/normals even when the argument is much more robust. Ideally, a paper should be as general as it can be as long as this preserves the main result and comp statics. The tendency of accounting papers to be unnecessarily dumbed down has caused many cases in which an economics paper comes out that solves the same problem slightly more generally (but with no extra insight) and then appropriates the credit. So, don’t do it: if authors have a nice general elegant model that they spent years to craft: reward them, don’t punish them. Keep in mind that you’re not responsible for errors – yes, you’re not. You’re responsible for basic due diligence on the appendix just like an audit does not check every single transaction. The authors, only the authors, are responsible for errors.
  3. Not my framework. A very large portion of bad reviews look like this. You model this, but there is a type of model that looks at this problem and I like (or maybe I don’t like, but I’ve seen it somewhere) and you don’t use it. So, what you are doing must be wrong. This type of approach often dismisses entire branches of literature by imposing that everyone should adopt a particular framework, even when this framework does not directly speak about the question, would not subsume the insight and would be very cumbersome to drag along. It also completely neutralizes innovation by rejecting any research that does not follow a particular dominant paradigm. These reviews completely ignore the fact that, in most cases, authors point out that the other framework that they use is actually well-accepted and useful. What’s going on here? It’s not your job as a reviewer to judge broad frameworks of the literature. If you’re not interested, don’t reject, have the courage to tell the editor that’s not a framework you want to review because you have no interest in it. Don’t ask authors to revise a framework that works well (or the semantics of their framework) because you have a taste against it. In the end, you should always ask yourself whether adding an ingredient to a model, even one that is among plausible first-order effects, really does alter the main economic trade-off or intuition; if not, this ingredient should NOT be part of the model. A theoretical model is about illustrating one force in a stylized setting, not about incorporating all first-order and second-order effects into a general messy theory.
  4. The revision does not obey.  This is not everyone but some reviewers have a dominance problem where they think they should impose their will on authors. In revisions, these reviewers will view as ‘word of truth’ anything they wrote down in a prior round and evaluate whether the author has completed ALL demands in a prior report. As a reviewer, don’t do that.  Even for the primary concerns, I need to check whether I was correct whether to ask something, and it can turn out that this was not to the benefit of the paper. Authors live in fear of pushing back against even a bad comments, because of this attitude of certain reviewers. I am confident enough to admit if I get something wrong and my role here as a reviewer is to make sure this does not contaminate the process or, worse, contaminate the published paper by adding mess to an otherwise clean argument. I have a process for this: if a point is not addressed, it should be stated in the author letter and one needs to be very, very careful, as a reviewer, to not repeat the point (which is very bad, intellectually deaf) but responding to what the authors say in the paper and in text, and why the concern remains or does not remain. Ask yourself whether this ingredient of the framework would remove an insight. Ask yourself why you think this other framework is indefensible and, yet, many good people use it to generate insight.

The only three reasons to have concerns

It’s not a competition: the field works better when everyone contributes and, ideally, we would not need to have rejections and would simply have projects being built until they meet the objectives for broad circulation. Yet, as the system is designed right now, there are places where an editor would need to know about some concerns. Whether these concerns are sufficient to advise rejection, or should be left to the authors to address is not something I can easily answer. However, what I can do is to speak about what I think are the ONLY reasons  why a paper might need more work. Caveat: this does not describe the editor’s problem, but remember that as a reviewer you are in charge of providing input to the editor, not of making the actual decision.

  1. No distance between assumptions and results. I’m sure that if I write this, at least some people will start putting this in their reviews, and I’ll start being blamed for rejection.. Anyways, I do strongly believe in this. Our work is theorist is not ‘to assume’ but to build bridges from assumptions to insights. If the result is what we assume, then there is no work being done and it all becomes a very matter-of-fact assertion about what the world is.  Evaluating the distance is not checking the length of an appendix, so don’t do that. To evaluate, there is a simple a test. Just write down the assumption in word, and write down the result, then try to argue it away and check how much the formalism has helped in making that argument a tight one that is not confusing to a listener.  Do it with someone that has not read the model yet and check for flaws in the verbal argument; check if the work done by the assumptions resolves this verbal logic flaws.
  2. Model and results have been done before. We can’t repeat the same model, same trade-off ad infinitum if this amounts to the same insight, even if the model is interesting. Now, an important detail here. We use a model as a building block to many arguments, and that’s fine, and we rely on the trade-off to have more things to say, that’s fine as well. What’s important here is to make sure that the same exact result within the same exact model is not already there in the literature. In that case, some vague notion that this is well-known is not okay: the claim must be supported by a clear reference to the result and a reference to a paper that derives that exact result. Say, showing that full-information is not optimal is true in many models, but whether the same exact form occurs for the same exact reasons is what we want to check.
  3. Unreadable mess. Among all the reasons that’s the very worst one. Theory is about explaining things, so what can we do when the paper is incomprehensible? Maybe the assumptions are unclear. Maybe the results are unclear: lots of cases in all directions that never get regrouped, or a lot of algebra that is ambiguous about effects all the way. I don’t like that at all as this loses the reason why we do theory. We make strong assumptions to get clean insight and so messy results are evidence that the assumptions are not well-built. A theory is not meant to show that things are complicated in the real world – we know that already – it is to show why and when things go one way or the other. We understand there may be cases, because context matters, but it’s important to have a sense when we should apply cases. A big mess where the implication depends on complicated unobservable forces or we only end up with a result that is so limited in scope that it does not answer the question, is not helpful. So, be general if you want, but don’t be so general that you don’t get any result.

 

A Primer for financial structural econometrics: the 2017 Mitsui Summer School

This week, I was fortunate to attend this year’s University of Michigan’s summer school on structural econometrics in financial economics, co-organized by Luke Taylor and Toni Whited (LW). There was a very dense web of knowledge to be gained from the camp, especially for structural newbies such as myself, and I would like to share a few of these insights with those of you who might (and should!) attend in future years, whether as an author or a consumer of this literature.

LW start their first lecture with the obvious question: (in my own words) why should we even bother with structural? This may seem obvious ex-post but I have to confess that, in graduate school, this question was never really asked because, duh, structural is about estimating economic models like an economist.  But LW offer a more detailed answer, by telling us about three applied benefits of structural work. A structural approach

  1. Estimates economic primitives, often in the form of institutional or behavioral characteristics that determine a choice. Because we have an explicit model of choice, we can claim that we are drawing from revealed preferences, a core axiom of economic analysis.
  2. Provides deep tests of a theory that go beyond a directional prediction. Are the magnitudes of a theory economically significant? Which of multiple theories contribute the most to explain a phenomenon? Which are suitable proxies to test the theory?
  3. Allows researchers to conduct counter-factuals, such as inferring potential consequences of a change in policy that was NOT observed in the empirical sample.

Let’s not hide it: structural work is hard! However, hard (to implement, or to do well..) does not mean obscure. In fact, if one were to summarize one big non-technical insight from the course, it is the following:

The quality of structural work is in its ability to make assumptions and how they extract economic insights from data sets (i.e., identification) transparent.

This, again, seems obvious ex-post but it is a template to organizing and reading papers in this area. Because it is new in some fields, I would like to cover next various means in which this general idea is implemented (of course, LW did not present this as a hard set of rules for any paper – so, reviewers, hold your rejection checklist – but as potential suggestions to organize’s one mind on a problem).

Step 1: Features of the data and model. The model picks up on a subset of presumed first-order effects in a setting of interest but LW strongly recommend, before any fitting exercise, to identify which features of the data are the object of interest AND whether these features of the data pin down the parameters of the model. How do we do this? By knowing one’s economic model very, very well: which characteristics of the model (a moment, a distribution, etc.) appear to change the most with parameters of the model, which do not? By plotting comparative statics over observable implications of the model, we can see which of these observables would be best suited to incorporate in the estimation.

Step 2: Being conscious of data-model limitations (LW use the term of not taking a model too seriously but I do not like that term because not taking a model seriously is as sinful as not taking data seriously and adds obscurity about what part of model is serious and which is not). Many data sets used in the social sciences have serious limitations, whether unobserved heterogeneneity, alternate economic forces or variables that are not quite what we think they are, to note a few examples. Building a consistent general model of all these limitations is not feasible at this stage of the science (and LW argue, might not even be that useful because it would soon turn into a black box). So what do we do? LW suggest to use empirical methods that are suited to these limitations. A few examples follow: use GMM or SMM for simpler models as they are more robust to misspecification and can target specific aspects of interest; control as much as possible for sources of variation outside of the economic model (e.g., scale variables to remove size effects, run a time-firm fixed effects regression on the variables and extract the residual).

Step 3: Recognize that the value of models is in understanding an economic trade-off. LW spent some time noting a misunderstanding in empirical research about model rejection (noting that this is still ongoing disagreement with some of their co-authors). Quote “All models are bad” and will be rejected (if one tries hard enough!). So, we learn little from whether or not we are able to reject a model and that a J test fails is not a deal-breaker. However, we learn a lot from analyzing where the model has failed, because it informs researchers on improvements to an approach. In fact, over-fixing a model by adding more structure to fit better can be counter-productive in that it turns it into a black box that no longer makes it clear what is or is not explained. LW suggest to look at moments that were not matched well and examine (as clearly as possible) why these aspects would not be explained well.

Step 4: LW note that this is not possible for any paper or any data set, but recommend using out-of-sample to provide external validity to structural estimates. Disclaimer: I do not like the term “out-of-sample” because out-of-sample refers to a sample that was not used in the estimation (different setting, different periods, etc.), but what they suggest is completely different. Semantics aside, their advice is very, very important for applied researcher. How do we suspend disbelief given some of the big questions answered by structural papers? We go back to the data and look at other implications of the model, seeing if they match aspects of the data for which the field may have strong-enough priors. Did the model do well to match moments that were matched in the estimation? Should we expect the coefficients to go one way or another in particular subsamples? Can we estimate the model in a subsample with an exogenous shock such that we strong prior about changes in parameters?

Another place where the summer school was highly successful is in giving a sense of where the literature stands at this point. With the increase in computing speeds and economic methods, there is an enormous opportunity in using these techniques in places where this was not possible even a decade in the past. LW note that only 2% of theoretical models have been estimated (I think it is a lot less..) and that entire areas are yet to be approaches: very little has yet been done on bankruptcy, household finance, labor finance, financial reporting.. and areas such as banking are receiving growing interest.

So, okay, you’re sold, what to do next? Well, first thing first, the summer school was intended primarily for Ph.D students and I would recommend to get in touch with one of the attendees and keep posted for next year’s announcement and program (here). Otherwise, the summer school had a great learning project that was done over the four-day course period. It is also useful, before going the summer school, to gain some working experience with Matlab or some other equivalent language. Although LW and their TAs shared their code and helped attendees along with their project, I find it very useful to come with some moderate programming experience to better appreciate the course.

Unregistered Report on the JAR 2017 Registered Reports Conference

On May 12th and 13th 2017, the Journal of Accounting Research conducted its annual conference as Registered Reports (RR). In RR, the review occurs primarily at the research design stage, before results are known. RR protect science against practices that correlate to acceptance likelihood but bias test statistics in ways that cannot be rigorously controlled for, e.g., changing hypotheses, data or specifications in the face of unexpected results or non-results, see below from the discussion paper by Professor C. Chambers.

Registered-Reports_Box1_LR.jpg

Overall, the experience has been a great success and provides a template for adoption as a separate submission path in addition to existing channels. In this entry, I will speak in broad terms of what I have learnt about RR in accounting research as well as some specific papers presented at the conference, all of which make a significant contribution to a variety of questions. Then, I’ll cover possible improvements to put on the table if editors want to make this model sustainable. Lastly, I will engage some bigger issues that RR connects to and the leadership by example that JAR is setting for accounting and non-accounting journals.

RR teach us that many questions are very hard to answer without pre-commitment to a research design, in cases where the universe of possible tests is unknown but likely large. This is true for experimental studies, run 100 experiments about various implications of a theory, for archival, correlate an accounting event to 100 possible outcome variables, and even for theory, try 100 assumptions and keep the counter-intuitive implications. Within this state of affairs, the field has become skeptical on virtually any result that comes out of the major journals, leading to a situation of research relativism that is entirely unacceptable. RR offer the means for authors to focus on sounds methods, without fearing their findings. It is therefore to the benefit of the community and to the benefit of the authors.

A secondary insight from the conference is that non-results can be the stars of the show. Many of the studies were well-motivated, intuitive and one would have expected some effect – indeed, the studies went after hypotheses widely-believed to be true by researchers studying them. Side by side to analyses of power, the non-results are prior-changing breakthroughs. They also reveal that knowing what we don’t know is the prior step to seeding the next papers.

The opening paper by Y. Ertimur, C. Rawson, J. Roger and S. Zechman is a fine example of RR as a tool to encourage risky data collection, that is, data that may not pan out with an easy-to-interpret or consistent message. ERRZ hand-collected evidence about the inactivity spells of CEOs, a common occurrence with external hires. I would have conjectized, for the CEO labor market, that high-ability CEOs come with more sensitive knowledge, implying greater likelihood of non-compete agreements, longer spells, and higher subsequent pay as a result of ability differentials and compensation for the gap. By contrast, according to conventional labor market unemployment theories, longer gap may indicate CEOs seeking a valid match, where a longer unemployment spell may publicly indicate lower innate ability and may even directly erode it. Surprisingly enough, ERRZ find results consistent with the conventional theory, thus showing that CEO gaps may not be conceptually that different from unemployment gaps. Perhaps, this is a first step toward unifying the study of CEO labor market into the many robust insights from labor market theories with search frictions.

The paper by K. Allee, M. DeAngelis and J. Moon use the RR format for an entirely different purpose. A criticism of the text analysis literature is that the degrees of freedom for cranking out (yet another) measure of information in text document is too large to separate data mining from truth. What ADM do is to construct a new measure on sound principles, before its relation to other variables are known. They define scriptability as a score capturing the difficulty in operationalizing machine reading of common accounting documents. Interestingly, the firm characteristics that were conjectured to correlate to scriptability do so remarkably well, a non-trivial feat in an RR setting. Against all expectations, however, scriptability did not seem to have a clear connection to the mechanisms through which information comes into price. Associations with price discovery, bid-ask spreads and volumes, are all over – often insignificant or even with the wrong sign depending on the document. The paper makes a strong case that machine reading may not have first-order consequences on market price. Beyond the non-result, the paper shows us how the construction of a measure, for settings in which many adhoc researcher choices are required, can gain additional value and credibility when conducted in the context of RR.

Two papers went after a field experiment. Field experiments let researchers observe real subjects involved in their habitual decisions, but with the benefit of experimental control samples. S. Li and T. Sandino ran a field experiment testing the effect of information sharing systems on creativity. Most of the literature provides experiments that supports the conventional view that creativity can be affected by intervention, but there is valid concern that the experiments that did not support this view were rejected or not written into a paper. LS created an app in which a sample of storekeepers in India could share promotional posters that they designed, and learn from posters submitted by others.  The results? The experimental treatment failed to elicit much change in the quality of posters submitted by storekeepers. Perhaps the benefit of such information-sharing systems are low relative to what people can learn on their own from direct physical interactions. This result may not be unique to India, and may extend to creative settings in which interpersonal interactions already do a good job at sharing information. It puts a caveat on current beliefs that social media will dramatically increase flows of creative information.

The next field experiment, by  D. Bernard, N. Cade and F. Hodge is a fascinating example of a creative research design to answer a controversial question: does giving away shares help the company sell products? On the one hand, many researchers believe that individuals dramatically exaggerate the effect of their actions – after all, how many of us vote despite the fact that our chances of being pivotal are virtually zero. On the other hand, rational choice advocates argue that, for such obvious decision problems, biases are unlikely to change customer purchases. I wish the reader to stop here and make a bet on which side wins the argument. BCH pre-commit to give a $20 share of ownership of stocks in Starbucks and then follow up on Starbucks purchases in the treatment samples against a control sample given a $20 share of ownership in a different company. The results? No effect is found. Hence, a non-targeted IPO might do very little good on company sales.

The paper was also an interesting learning experience because there was somewhat of a blowup, midway into the paper, as authors conducted very compelling analyses that did suggest the existence of an effect. But these analyses were unplanned, and conducted ex-post in response to the absence of a main effect. This may seem obviously misled, but let’s review what these analyses were. Again, they were very compelling: companies distribute shares to customers and even IPO would be over-subscribed by existing clients. Hence, a natural test would be to select subjects who drink coffee. In this subsample, it was found that stock ownership increases purchases.  But these analyses were only conducted because the main effect was not found, opening the question as to which other reasonable analyses would have been in the universe of possible tests. The message here is clear “report but do not conclude”; the door is wide open to further work.

The paper by H. Eyring and V.G. Narayanan is another take on the degree-of-freedom problem in some of the question we ask. EN were given permission to experiment around the information displayed to takers of Harvard’s online courses, showing either means or upper quartiles of the performance distribution, and measuring whether student achievements improved. The issues with such a question is that one could have reported various quantiles, or report it in different ways, until some effect were observed. Among the papers, this was the one with the strongest confirmation, as it was clear that the reporting of a bar that could be reasonably met increased performance, while a bar that could not be reasonably met or was too low reduced performance or had no effect.

We can debate ad infinitum as to whether the same insight can be applied to settings in which financial incentives are also a choice, but (i) there are compensation settings in which the employees cannot primarily rewarded based on financial incentives (e.g., unionized positions, bureaucracies), and (ii) the education application is, even on its own, a very important application that can be directly used at very little cost by the many organizations that provide online learning tools. This is one of the rare studies in our area that immediately translates into concrete useful applications, and the RR has given additional validity to its insights.

Surprisingly, there was only one experimental paper at the conference. Among the entire set, this is the paper that had the most unambiguously clear non-result – a fact that may be a benefit of clear laboratory data where opportunities to seek other variables are limited. Z. Kowaleski, B. Mayhew and A. Tegeler investigate a setting where interactions between audit and non-audit services could create an environment conducive to collusion between the firm and its auditor. They conduct a repeated game and examine whether the equilibrium that is being played seems to shift toward more or less manipulation in a group with both audit and non-audit services. The results? Not at all. The average occurrence of misreporting and auditor enforcement appears to be the same. This also strikes against a widely-held view in the post-Enron world, namely, that auditors should be strictly regulated because most of their activities will contaminate the quality of audits. The paper speaks to the possibility of collusion in a repeated game even without non-audit services and how the consulting culture does not appear to be a factor moving the equilibrium.

The final paper was an outlier. It was a fundamental contribution to accounting knowledge but also, not the type of paper that starts with a well-defined hypothesis to reject, or even strong researcher priors. Here, the RR format nevertheless played a major role by eliciting a huge effort in data collection. L. Hail, A. Tahoun and C. Wang go back to old news archives in countries all around the world, up to two hundred years in the past when possible, to search for occurrence of scandals and enforcement actions. This was meant to provide descriptive evidence about the lead-lag relation between regulation and scandals, a fact that was, prior, unknown. This is critical evidence that speaks to the fundamental question: what is the positive model for what regulators do? It’s an odd duck because, as one of a  journal editors noted after the talk, this was not a paper that was after some risky results and so the reason for pre-committing to analyses, versus a direct exploratory review of the data was not self-evident. The paper did inform us that journal may consider an alternate format with pre-acceptance of data collections that is of general interest but resource-consuming, without some of the strict requirements in the full RR.

In a follow-up panel session, Chris Chambers, a world expert in operationalizing RR, noted that the accounting profession has taken the format as its own, adapting it to the needs and challenges of the field. Also, all involved authors strongly supported opening a path in journals for RR, as one possible submission format. I’d like to share here some additional personal thoughts about what might be part of future iterations of this idea.

Methodology. Most of the review process was focused on data and experimental design questions, probably because these are the first-order aspects to be considered.  In terms of methods, most studies fell back on linear regression as the main tool. There was very little Bayesian analysis (the workhorse for these methods in the natural sciences) or even any of the economic analysis tool that are standard in the economic sciences, e.g., estimation of preferences or some choices made within the data set. That’s too bad, because good data is the main obstacle to methodologies that are rich in insights, so the transparency in data in RR is a complement to stepping beyond interpreting conditional means. In fact, RR that focus on analyses methods is perfectly suited to respond to theoretical overfitting, since there is a large universe of statistical or economic models suited to a data set. I hope future iterations of the RR will involve using the format as a stepping board for using the cutting-edge methods to draw information from data.

Theory. A striking feature of most of the seven studies is that they were all based on an relevant hypothesis, none of them was based on a theory. The difference between hypothesis and theory, is that a theory is a broad set of concepts that will lead to multiple hypotheses. Even if we find that say, social media increases creativity, or that a particular game induces collusion, it is not clear which theory is being tested or what theory led to these hypotheses. Starting from a primitive theory is difficult, but the RR offers an ambitious agenda to go after research designs that will falsify theories by looking jointly at the many hypotheses they imply.

Power. C. Chambers also noted in the panel that power analysis is part of RR, but not necessarily a feasible first step in the absence of an easy formulation of a Bayesian prior. This echoed many concerns by the audience that accounting problems are relatively unstructured and do not lend themselves to analyses of power, relative to fields with more structured knowledge of disturbances. Measuring power requires assumptions. Nevertheless, I find counter-intuitive the principle that not making assumptions and not measuring power should dominate a best-effort estimate of power, one that is transparent in the assumptions that have been made and, in the worst cases, offer alternatives. Without power, many of us were unable to set confidence intervals on the non-result and some of the potential value of RR is not realized.

Conclusions. C. Chambers also made a point, toward the end, that unplanned results contaminate the RR and, while helpful, should be separated in the conclusions. It was striking that authors, not referees, felt compelled not to stop at non-results, and worked very hard to provide supplementary analyses. The quest for meaning is, perhaps, not surprising but does show us, once again, why the RR format is so desirable. For the word RR not to lose what it conveys, special care should be taken to separate these supplementary parts from the main conclusion, perhaps by omitting altogether from the abstract and requiring findings to be outlined and in a different paragraph.

To conclude this blog, let us keep in mind that the RR experience, conducted here within the strictest rules by the editors, can teach us about mixed models that borrow from RR and a conventional review process.  C. Chambers mentioned offering authors a result-blind submission path, in which authors submit their paper without results. This is different from RR in that the analysis has been conducted and is known by the author, and does have caveats relative to RR. However, it is a path to re-open important non-results or unexpected results that exist but have been buried by the publishing process. It’s also an opportunity to work hand-in-hand with other follow-up RR, or push the community toward the replications that need to be conducted.

As to the broader message, RR teach us about the value of transparency, namely, transparency about research protocols. The transparency is not just to outsiders but also to ourselves, and it makes us realize how the process can distort significance levels.

I will add three short notes, which are probably slightly on the more controversial side but could be nevertheless items to think about.

First, part of the skepticism in research has led to p-values that are probably too stringent for a strict RR report. That we should require a 10% confidence for a result is fine when authors are expected to pick among many regressions, but a lower confidence might speak a lot to noisy problems with large economic significance. E.g., should we only accept global warming if we get to 10% significance, or should we act before this? The answer may be the latter if we know that the research has been conducted without bias. Perhaps more discussion is needed about acceptable p-values and the format of reporting significance stars should be changed.

Second, are we ready for universal adoption of the RR as the method of empirical analysis? Absolutely not! Most of what we do is exploratory in nature as we don’t have good theories and data is setting specific and contains sources of noise that are hard to structure. RR might inform us that the current focus on “testing” is excessive as part of our exploration work, and we should instead rethink conventional studies as helping design better theories – once this is done, RR will be the format for the actual testing.

Third, the RR is part of a broader effort, also engaged by JAR to offer transparency in how the research has been conducted. With the current policy, which is currently only adopted by 9 journals in the Financial Times 50 list (of which only Journal of Finance in financial economics), authors must share their code, see at the end of this post for a list. Since we do not expect most papers to be RR, transparency approach gives to the community tools to evaluate the robustness to alternate specifications or hypotheses, and is the standard companion to RR when RR are not possible.

We shall all be looking ahead as to a wider adoption of the RR guidelines in accounting, so that accounting takes its natural role as a leader in promoting transparency.
Appendix: data policy in the Financial Times 50

 

Journal (with link to policy) Code sharing Non-proprietary data sharing
 American Economic Review Yes Yes
 Econometrica Yes Yes
 Journal of Accounting Research Yes No
 Journal of Applied Psychology Yes No
 Journal of Finance Yes No
 Journal of Political Economy Yes Yes
 Marketing Science Yes Yes
 Quarterly Journal of Economics Yes Yes
 Review of Economic Studies Yes Yes
 Academy of Management Journal No No
 Academy of Management Review No No
 Accounting, Organizations and Society No No
 Administrative Science Quarterly No No
 Contemporary Accounting Research No No
 Entrepreneurship Theory and Practice No No
 Harvard Business Review No No
 Human Relations No No
 Human Resource Management No No
 Information Systems Research No No
 Journal of Accounting and Economics No No
 Journal of Business Ethics No No
 Journal of Business Venturing No No
 Journal of Consumer Psychology No No
 Journal of Consumer Research No No
 Journal of Financial and Quantitative Analysis No No
 Journal of Financial Economics No No
 Journal of International Business Studies No No
 Journal of Management No No
 Journal of Management Information Systems No No
 Journal of Management Studies No No
 Journal of Marketing No No
 Journal of Marketing Research No No
 Journal of Operations Management No No
 Journal of the Academy of Marketing Science No No
 Management Science No No
 Manufacturing and Service Operations Management No No
 MIS Quarterly No No
 Operations Research No No
 Organization Science No No
 Organization Studies No No
 Organizational Behavior and Human Decision Processes No No
 Production and Operations Management No No
 Research Policy No No
 Review of Accounting Studies No No
 Review of Finance No No
 Review of Financial Studies No No
 Sloan Management Review No No
 Strategic Entrepreneurship Journal No No
 Strategic Management Journal No No
 The Accounting Review No No
 Total 9 6

A guide to fancy living in NYC by the cost-conscious academic

Every year, a few thousand academics come to NYC to join one of the many universities of the City. Having been around for a little over five years, let me share a few tips for the newcomers in the lifestyle I’d best describe as ‘Bohemian Nerd,’ or someone who is conscious about optimizing comfort, while staying true to the diversity and open-mindedness of what the City has to offer.

Housing. Few newcomers consider Uptown as a location of choice in Manhattan, yet, north of Central park (110th-125th), on the stops of the Red, Blue or Green lines have become some of the best locations in Manhattan for active young singles or couples. On the west, between Amsterdam and Malcolm X, Morningside Park offers impressive views of Harlem (on Morningside Drive), and offers combinations of playgrounds for kids (on 123rd or 116th) and green open spaces below (on 110th). Frederick Douglass, between 110th and 125th, with its many cafes and restaurants (for example, Harlem Tavern) is a great place to have brunch or dinner after a long day of work, and easily accessible via the A/B/C lines. One can pick from one of the many high-end condos or more typical apartments in townhouses on the cross streets (my preferred choice since NYC can be noisy on the main avenues). The East side is a little cheaper, but home to both Wholefoods (125th) and a Costco, as well as many stores on 125th. North Central Park has a large public pool open all summer and a skating rink in the winter and is a great place to relax or meditate. Note that these areas are diverse and extremely safe, in fact as safe as any other area in Manhattan (which, in my experience, is even safer than an already fairly safe city such as Paris). We’ve crossed the area at late night (1am) and, as to Morningside Park, see people walking their dog at late hours – we’ve never had any problem. The further west area Riverside-Amsterdam from 105th to 125th, has nice park areas by the Hudson and a beautiful walk, but is slightly underwhelming as to both character and things to do and has become slightly on the expensive side. If you live there, you’ll have to go down to the upper west, around 96th.

What about other special areas such as Park Slope (Brooklyn), Upper East, Soho/Tribeca? Park Slope is very nice (with the park near by) but far if you need to commute to Manhattan, perhaps a good compromise for larger families. Upper East is very expensive, count about double relative to Uptown. Soho/Tribeca is, hands down, the most delightful area of NYC with its low-rises and nice shops but it can be noisy and, for the better parts, can be even more expensive than Upper East. It’s just easier to travel there rather than live there every day, especially considering that it does not have the benefits of Central Park nearby. The main advantage of these three options is that public schools are top-level, especially for families of 2+ children (more on this later on). The upper west is always a safe bet, with a greater focus on the busy executive and a lesser focus on character. Lastly, many academics live in the North suburbs (Scarsdale/Westchester), but count between 1 hour and 1 hour and a half of commute. This is the residential experience, except at a higher price point than most other cities and I do not advise it since commute can interfere with being at the office every day and adds to fatigue. It may be, nevertheless, an option to move there with 2+ children in school age (but only at that moment!) since private schools in Manhattan can be expensive, more on this later on.

enhanced-buzz-wide-27631-1382994440-33.jpg

Food/Grocery shopping. Food shopping is a complex task in NYC because there is no central Walmart where one could pick up everything and be done with it. Every place has its own specialty and is not-so-good for the rest. Trader Joe (around the city) presents one of the greatest generalist compromises with high quality at good price. But let me improve the experience with specialized better ideas. Zabars is the best place in NYC for inexpensive and decent cheese (try their Manchego, Fourme d’Ambert and Comte), fresh pasta and a delicious Scottish salmon. Street vendors – all over the place – are the best place to get fruit and vegies at low prices (about half Wholefoods), although quality can be variable. Avoid the pricey local supermarkets of the fancy neighborhoods which give you low quality at high prices, but I recommend Best Yet market for a large supermarket with great selection of vegetables at very low prices. If this is convenient, Costco (East side) and BJ (Bronx) have very nice groceries. Many new-yorkers go to Fairway. It presents very well but is incredibly pricey – however, I found no better alternative when it comes to seafood and the olive oil is quite nice as well there as well.

Let’s not underestimate online grocery shopping, which is a must-use in NYC. At the high-end, Freshdirect offers a zero-cost delivery subscription (for orders starting at $30+) and has great food, especially Milk, meat and seafood. Prices are not competitive for fruits and vegetables. They also have a nice selection of other things, such as their Brioche or chocolate croissant. They also have competitive prices for bottled water (such as Perrier). For anything that is durable, such as cans, rice or even other household stuff, I found no better than walmart.com with free shipping from $50+ and half the price of a regular supermarket.

For wine lovers, what not to do is to pick up wine store. These are well presented but are expensive and usually mix up low-price but near-undrinkable bottles with fine wine at heavy markups. Alcohol in general should be bought online in New York. I strongly recommend Wine Advocate’s weekly wine buys for unbeatable 90+ bottles starting as low as $15. The only downside is that stores vary and this takes a little time to set up. I personally like the site lastbottle.com, which sells a new bottle each day, at top prices and quite often rates in the mid 90s. What I like the most about the site is that it proposes bottles (no subscription needed though) so this saves the search costs. If you want to try out and you don’t have a friend who uses it yet, my code is here (they will also credit you with $10 in wine credits).

Moving around. Subway in NYC is a mixed experience. The North-South line arrangement and the ability to move between express and locals makes the subway cover a lot of distance in reasonable time. You can easily traverse half the city in less than 30 minutes, and sometimes only 20 minutes on some lines, so the City allows faster commuting than most other locations. In addition, there are many, many subways, and wait time is minimum. However, there are some big problems. The subway is very crowded during rush hour and it’s not unusual to be squished against a side; not great unless one is into out-of-body meditation. In addition, it is common for subways to run strange. Suddenly a local subway would turn express and let you go 40 streets north of your stop, or some problem of the line will force you to rerout via another line. If one has a time commitment (like picking up children or a meeting), the subway is completely unreliable and many times I had catch a cab.

But are there any alternative? Cabs are very easy to get but expensive and so is Uber (Lyft has now become much more expensive than Uber in NYC), if about 30% less than regular cabs. Bus is atrocious. Heavy traffic and stops at (almost) any traffic light make a bus ride at double the time of subway – of course, for that reason, buses are not too crowded. So it things look quite bad at this point.

Fortunately, a few new services have completely changed the situation recently. The service Via now offers free car-pooling for $69 a week (less for monthly pass), as many times as desired or a flat $6 pay-as-you-go cost. Besides, it’s pre-tax money so the comparable cost to a cab is misleading. Cars are generally large SUVs (Toyota Highlander) where it’s easy to get work done during the trip and very, very comfortable. Because of heavy traffic, it is longer than subway during rush hour (count 1.5x) but it’s great option if you can do some work – like read a paper or work on a laptop in the car. Also, one thing I love about Via is that you cannot miss your ride; cars are almost always in the same brand (unlike other services) and they tell you at what corner to pick it up. If you sign up, ask a friend if he has an account and you can put his code and give him and receive some credit (currently $10). Uber also launched a pool service, but it’s inferior to Via – good prices only work at particular rush hours and rides are not as comfy as Via.

The Arts. Academics and students must use the service tdf.org, which is a service that can only be used by people in education. This works a little like ticket booth, the last minute service to get discount tickets to theater but with three big differences: about half cheaper (around $40 for a prime show, a lot less for off-Broadway), one books one or two weeks in advance and, the most important, seats are typically spectacular as long as the booking is not for too many people (2 or 3 is ideal). tdf does not have a huge inventory of shows, but it is quite nice and is renewed. We’ve seen some very nice theater (like top productions of Sartre or Shakespeare) as well as new musicals such as Rocky or Spiderman. Especially, it’s easy to miss out on the non-musical theater scene which is quite good in NYC, and tdf is the place to go to keep current on what’s there.

NYC has, of course, great museums. Remember that many museums are free in NYC so you can donate but it’s up to you. The Museum of Natural History is a great place for families and the special exhibits are always great, although these are paid extra. The MET and Moma need no introduction of course. For those interested in museums, I strongly advise to use Chase bank, which offers, for free, a pass for its private clients at many of the museums that are not free, such as the Guggenheim for example.  For museums that are technically free, the pass gives preferential access without line and discounted prices for the activities as a corporate member. Also for families, day trips to the Liberty science center and New York Hall of Science are worth it, but I’d advise to first take a Via to Brooklyn to cut on the cost – in  both cases, commuting via public transport in a nightmare.
Interestingly, buying art occurs in the most unexpected of places: on the street by a museum among the tourists (especially met), a few painters sell beautiful art pieces. The same pieces retail for a much higher price on the internet or from their studio (usually Brooklyn) but there is more choice then. So, it’s probably the best option to first discover a painter’s art in the street, and then visit their studio for more choice even if it comes at a steeper price – usually x3-x5. Of course, the pieces in the studio are usually more distinctive motifs.

Restaurants. Little known is that the truck street food is one of the best in the country. The typical truck has amazing lamb with rice and vegies, italian sausage and, for the vegetarians, falafel that I’ve rarely seen matched, all of this for around $5. Trucks can be found at most street corners.

People say that restaurants are great in NYC, but I’ve rarely found this to be true. Most restaurant food, even when it is in fancy room, is very simple and does not strike you as more prepared than what you could easily do at home, plus the extra price. A few caveats to that. I’ve always been pleasantly surprised with oysters in NYC, and groupon offers reasonable deals (at full price, it’s horrible though). Some staples of NYC are always great, so you can get a nice Reuben even at a touristy place and not go wrong; the restaurant burgers are not too expensive and quite nice in almost all restaurants.  For the everyday, the chain Chipotle is now widely distributed in NYC and presents one of the best picks in the City (in fact, taste-wise it beats most of the fancy restaurants).

A slightly bohemian American with music to try out is the Harlem Tavern on 116th, especially for Saturday/Sunday brunch.  For the more exotic, Jin Ramen on 125th offers the authentic Ramen experience, delicious! Another great pick that never disappoints is La Tartine in Soho – an affordable French that beats in quality any other fancy restaurant – and you can bring your own wine, but beware that this is a well-known secret and wait lines can be very, very long during peak times if you don’t come early.

Shopping. Shopping is special in NYC but one needs to know where to go. Surprisingly, it’s not at the grand names on Madison that the best shopping is but at the discount stores such as Marshall, TJ Maxx or Century 21. Because of their location, these stores receive the brand name inventory of each season and resell it at extremely heavy discounts, from half to a quarter of the price. Note that these can be top makers, it’s no unusual to find a YSL, Gucci or a D&G shirt or tuxedo there. They also offer tons of other things such as toys or perfume, also at discounted prices.

There are a few outlets malls further outside of the city. An example is Tanger Outlet at Riverhead in Long Island. It’s very extensive and good to visit, and can be combined with a day at the Hampton’s or at the water park Splish Splash nearby. Prices there are better than on the web or in stores in the city, but, in the end, it’s more about the experience. Note that one goes there via train, and then there are taxis or buses at the station that will drive you anywhere in the area.

Families. NYC is great for families but one needs a few insights about what to do. For the little one, the Swedish Cottage on Central park has nice Marionette shows that change along the year. For all ages, I recommend the Blue Man group show which is also on tdf most weeks. I’ve been very disappointed about most off-broadway productions for families.

During the week-end, kids love Coney Island and there are other things to do for grow-ups as well. The beach itself is just okay, and the water tends to be a little cold for extended swimming – besides, it is a bit crowded. However, Coney Island is accessible by subway with almost no connection – so it’s very practical. The boardwalk is nice and spacey, and there is music and a nice atmosphere to it. There are also a few parks with rides that cater to all ages, especially little ones that have a ton of fun every time (and pricing is very reasonable). Lesser known, if one moves away from the beach, there are some cafes with a nice caribean feel to them and perfect for cocktails in the evening. For the high-end experience, Sandy Hook in New Jersey is a set of very beautiful beaches in a national park. To get there, one needs to get the ferry (at many locations) and it takes about an hour to get to a complete out-of-city experience where most is pristine. The ferry ride is quite impressive as well and never gets old. Expect however things to be extremely windy which can be unpleasant if the day is cold.

Snow and ice are great in the winter. Central park has various skating rinks (Lasker in the North for example is a skating rink in the winter and  a outdoor swimming pool in the summer). There are also many places in the parks where families to ‘hit the slopes,’ and you can use a regular cardboard to slide or get a better looking one at any sports store. Last but not least, for kids, skiing is fairly easy from NYC. Even if you don’t have a car, take the Harlem 125th train Metro North station, and head for Patterson – about 1h40 minutes and very easy. From there, a free shuttle will go to Thunder Ridge (5 more minutes), a small ski resort that’s perfect for kids to learn skiing with a long green slope that comes from the top to the bottom of the main hill.  Otherwise, Mountain Creek in New Jersey  is the closest (and a little more interesting for adults) but requires driving, about 1h20 minutes door to door.

As to other activities, the Y has many locations all over the city and can offer classes for all ages in things that go from hip-hop to swimming or soccer. These classes are excellent and reasonably priced. It’s usually possible to arrange it during a week-end and back-to-back in order to have a sequence of things to do. Note that one does need to be a member to sign up for an activity at the Y.

Another challenge for families is schools. On the positive, NYC has implemented a free public pre-K program so that’s something to take advantage of. This program is open to public schools as well as many private programs. Going to K and beyond, schools achievements can be unequal but people tend to overweight the importance of a school district because, I can only assume, there is a quantitative measure publicly available on the site of the Department of Education. I’ll offer some thoughts about this.

Some public schools have an excellent reputation through their specialized dual-language programs, see for example PS. 84 Lilian Weber for a French or Spanish dual program. These programs are designed to combine native speakers and non-native speakers and teach in both languages. They are separate from the school and benefit from collecting the most adventurous families. For tiger moms and dads, NYC has developed a gifted and talented program which includes a test exam (yes, a test for a 5-year old!), so this requires some active training by parents – not my cup of tea. Further, gifted and talented programs give a lot of homework, as we have heard from other parents whose kids joined these programs. Perhaps there is a great upside to this in the future though. Another option is private schools, and it is not widely-known that there are some excellent but not excessively pricey private schools run by religious institutions. The diocese has very good schools with excellent traditional education (e.g., Corpus Christi on 121st); other denominations have great schools as well, as I’ve heard very satisfied parents. Note that these schools are very respectful of diversity and do not impose the school’s belief on children or parents. Lastly, there are some amazing private schools that give you some of the best teachers and activities, for small families of one or two, this usually cannot go wrong. For example, the School at Columbia is Columbia’s university magnet school with experience-based education and was recently handed a distinction for science education, in person by president Obama.

A few more remarks. A great thing about the City is the number and variety of playgrounds, and it’s a pleasure to vary across the many options which can often be at walking distance. During the summer, most playgrounds have water fountains that kids absolutely. I have a few favorites. The Tar Family playground is one of the best with a beautiful pyramid, water fountains and lots of activities, including a sand area. It’s also next to large open spaces in Central park to enjoy a relaxing day. The Hecksher playground is the largest in NYC, and has lots of space and nearly everything – unfortunately, kids love to play in the rocky area and this is not for the faint of heart parent. Third, the Billy Johnson playground is small but it has the best slide of central park and is right next to the Central Park Zoo, a repeat must-see activity.

That’s all I have. If you want to share more tips, feel free to comment below!

How empiricists should read theory

I was at a conference a few days ago and, after a theory speaker had finished his talk, a senior empirical faculty turned to me looking angry and said “you know, now that you’ve got tenure, you should write a paper about how empiricists should read theory.”

The problem is, I don’t know myself how theorists should read theory – so, next best thing to writing a paper about something you don’t know – I’ll jot down a few thoughts in a blog and hope it all comes together at the end.

Thought 1: Read more than the title

Make a prediction that can be tested in a data set: chances are more than one theory could deliver this prediction.

I’ll take a running example: consider the confirmation theory of Dye (1983) and Gigler and Hemmer (1999). The theory predicts that a verifiable message (e.g., earnings) makes otherwise unverifiable communication (e.g., forecasts) credible. But, this can only occur in the very different theory of Einhorn and Ziv (2008), a world where disclosures are credible by assumption, but the manager can pretend to be uninformed and withhold information. Realized earnings, in this model, inform investors about the manager’s information endowment.

Theories are about mechanisms or the means to get to a prediction, and within each paper, there is enough information to test which mechanism is operating. So a good reading of theory should identify the empirical content of the mechanisms (to be tested as well), not just the end prediction. 

Returning to our running example, under confirmation theory, credibility is assured by writing a contract that punishes the manager for missing a forecast, controlling for earnings. Do we see this in the sample? Under dynamic disclosure theory, we should see that low earnings, indicative that the manager probably withheld strategically, should cause more disclosure in the next period, a very different time-series prediction.

Thought 2: Normative research provides guidance about what to study empirically, and why.

An old tradition in accounting – which by the way is no longer mainstream even in empirical research – is to view normative research (“how things ought to be”) as suspect and probably wrong. Recall that this came to be, over the 70s (see Demski 1973) because a number of individuals were dishing out accounting knowledge in the form of religious edicts, and this had to stop for serious scientific research to begin.

The majority of theory work has clear normative implications, and some of it does not contain much in terms of testable predictions, so having appreciation for what it is trying to do is important.

To talk about what normative research is, I’ll make a quick parenthesis along STEM, Science-Technology-Engineering-Mathematics. To over-simplify, let’s identify Science as the scientific method and take testability as a core principle. Let’s also set aside Technology and Mathematics since these are tools that are rare. However, Engineering is different. If I want to build a bridge, I make plans to do it based on sound principles validated by science. I’m not building a bridge only for the purpose of testing and, if I were to require testability always, no first bridge would ever be built.

This is what normative research is all about, namely, make plans for improvements that do not yet exist. Normative research requires good assumptions as inputs, hopefully assumptions that we believe have been tested. This is important: after all, without engineering/normative, then all the scientific knowledge we could accumulate would have no means of creating better outcomes.

So, how should empiricists consume normative theory? Normative theory is the natural end-point of the knowledge we create but it makes many assumptions. Knowing whether these assumptions are descriptive or not is the realm of positive theory and empirical research. Therefore, normative theory gives guidance to empiricists over what meaningful assumptions should be tested. It makes the empirical exercise relevant.

Thought 3: Find the selected math that summarizes.

Ten years ago, I remember attending a week-long seminar by an accounting theorist. I was shocked, rather than tell us about all the great things he had found, he started his talk by saying that “math is an unnecessary evil” and, then, looking at one particular faculty in the crowd, said they should shut down one top econ journal this faculty had published in (for, apparently, having published a few papers there, he was now its representative).

Anecdote aside, let’s ask the question: why do theorists use symbols when they do their work? Is it something that’s back-end material to be entirely ignored by empiricists?

Math in argumentation has many purposes, and one of them being to support a tight logical argument. For example, how often is it than one is lost in a wordy hypothesis where everything seems to float in the air, and multiple logics seem to be operating?

But is math only useful as a method of validity, so that empirical readers may safely ignore the math once a referee or journal has verified its correctness? To answer this question, consider absorbing Adam Smith’s Wealth of Nations, and compare it to an undergrad micro textbook treatment of the welfare theorems: which one is easier?

Empiricists can learn from math in a study because a few equations can provide a concise simple summary of otherwise difficult trade-offs. You don’t need to bury yourself in the appendix, or even to follow every main argument to get to these equations. So, to use a theory papers, remember those few equations that summarize the assumptions and main results; this is often much easier than remembering convoluted steps and implications of a wordy logic.

Thought 4: Take a theory seriously, not literally.

Any theory is a simplification of reality, it does not aim to be descriptive of everything and, in social sciences, the most successful theories only get to first-order effects. We can’t take theories literally to be exact representations but we can take a theory seriously enough so that it may explain empirical behavior.

Unfortunately, many empiricists  view theory as merely motivational. By motivational, I mean that a theory is here to introduce a topic but nothing more, or that it is part of an enormous bundle of theories that have nothing to do with one another and together deliver predictions – some of which may occur in one of the theory and not the other, in both or in none.

So taking one theory seriously really means the following: let’s shut down for a moment all other theories, and assume that the world that we see in a data set has been generated by this theory (and only this theory). Is what I see in line with what the theory says? Is the theory complete enough to speak to all the empirical facts I want to study?

Doing this requires some specialization: if one is serious about a theory, one needs to know it very well – but the payoff is to be able to test multiple predictions of something parsimonious and clear. Few studies can claim this type of transparency.

What about alternative explanations? The good thing about it is that, once we are binded to multiple deep tests of a theory (being serious about it, that is!), most alternative theories will often fall as being naturally ruled out by these tests without having to design specific extra tests.

Thought 5: Theories tell you about (unobserved) exogenous variation in observational samples.

Some disagree, but I like the idea of theory as a poor man’s substitute to unavailable data. If we lived in a world where we could experiment anything instantly and at no cost, we would not need any theory to make better things – we would simply proceed by an infinite set of free experimentation.

I think that’s the deep problem with advocates of the self-called ‘credibility’ revolution who believe that experimentation is required to solve any problem. Yes, experimentation is better but isn’t free or always feasible – so, wherever it is missing, we need to rely on theory.

The greatest example of this is observational data. I was at a conference at Kellogg law and a statistician complained that most research designs in the social sciences would not meet the standards of medical science – he was quite critical of observational data outside of a controlled experiment and believed that instrumental variable methods (even assuming the exclusion held) had serious statistical flaws. I think his point of view was that only carefully planned experimentation could meet the standard of proof, noting that economists always asked him to “Believe..”

Yes, indeed, theory does require to believe. Believing makes things less credible, but there is often no alternate course. But let’s be more precise now, what does ‘believe’ mean in an observational design? Theory is a statement from a source of exogenous (but often unobservable variation) and how this exogenous variation can cause outcomes. The theoretical exercise accepts we need to make assumptions, but requires that these assumptions be clear and logically used – this is the least we can do.

What does the theory conjecture is the source of exogenous variation? While it does not offer certainties, this can inform the empirical design about what assumptions are being made, and (within the theory) how these assumptions are used in a consistent manner.

The evil word, here, is of course endogeneity. Endogeneity is a fundamental characteristic of any observational study. Theory does not solve the endogeneity, if we mean, by solving, providing the same level of confidence as if there had been an experiment. However, theory does clarify a plausible mechanism for the endogenous relationship between variables, and links them to an exogenous source. Theory clarifies the assumed source of (unobserved) exogenous variation.