Author Archives: jeremybertomeu

Reviewing Theory in Accounting

I’ve delayed this long enough, and decided I should say a word about the situation in accounting where meaningless reviews destroy incentives to innovate and send to the garbage bin often years of hard work by authors.  So this is a question: How should you review a theory paper?

I believe I have credibility in answering this question. I received an award from one of the accounting journals. I got this award not for quantity (I don’t review that much for this journal) and not for rejecting papers. I got it for accepting (good) papers and, instead of throwing a paper away and moving on with my own research, putting a lot of work to understand the mindset of a paper and helping out. I started doing this because I got fed up with evil behaviors as a struggling posdoc in which competitive researchers play a zero-sum game and ignore the damage of dishing out rejections that are not commensurate with the quality in a Journal.

Ok, so here is the short answer: if you write a review, read at least two recent issues of the Journal – all papers, all methods – read them carefully and learn to appreciate questions and approaches. Do not criticize, keep a open attitude and assume these are all useful contributions. Now, set the bar for accepting within the bottom tercile of the in this set, across methods. Recognize it’s hard to obtain perfect or even deep answers regardless of the method used and benchmark a reasonable expectation to this. Once you have done this, think if, with sufficient work by you and the reviewer, the paper has either met this target or may meet it after two revisions. If the answer is yes, then do not reject the paper. When in doubt, always leave the door open for editors to make the final call.

Few people talk about this but let me next offer reasons that should NEVER, EVER show up as a primary motive for rejection. Then, I’ll give the three primary reasons why you may, sometimes, err toward rejection. Caveats: (1) only for top journals, (2) are primary but there could be secondary comments that were neither necessary nor sufficient (but could be useful for revising), (3) even these reasons should be applied with extra caution, maybe not at all, for unusual papers that step out of common paradigms, (4) always apply with caution: when in doubt, give the authors a chance.

Four reasons not to reject a paper

  1. Not realistic. The word realistic should never show up as a referee report as a motive for rejection. Stylized models are meant to be unrealistic because their point is to refine a trade-off by assuming away aspects that have nothing to do with the trade-off. It’s very, very rare that something assumed away would completely subsume the trade-off and, then, this should be shown formally. A realistic model is a bad model, because it would necessarily fail at providing a clean argument. If you don’t believe me, check the most successful theory papers in accounting, from Verrecchia (1983), Dye (1985) to Kanodia and Lee (1998) or Fischer and Verrecchia (2000), among many, many others. Are these models remotely realistic? In fact, quite the opposite, ask authors to be realistic and you will get ugly models with lots of things going on and unclear trade-offs.
  2. Math too hard, model too general. Lazy reviewers will sometimes reject the paper because they don’t feel they want to check that the appendix is correct or there may be an error somewhere. A version of that is results that are surprising but require a lot of analysis to get there.  This forces authors to restrict their papers to closed-form binary/uniform/normals even when the argument is much more robust. Ideally, a paper should be as general as it can be as long as this preserves the main result and comp statics. The tendency of accounting papers to be unnecessarily dumbed down has caused many cases in which an economics paper comes out that solves the same problem slightly more generally (but with no extra insight) and then appropriates the credit. So, don’t do it: if authors have a nice general elegant model that they spent years to craft: reward them, don’t punish them. Keep in mind that you’re not responsible for errors – yes, you’re not. You’re responsible for basic due diligence on the appendix just like an audit does not check every single transaction. The authors, only the authors, are responsible for errors.
  3. Not my framework. A very large portion of bad reviews look like this. You model this, but there is a type of model that looks at this problem and I like (or maybe I don’t like, but I’ve seen it somewhere) and you don’t use it. So, what you are doing must be wrong. This type of approach often dismisses entire branches of literature by imposing that everyone should adopt a particular framework, even when this framework does not directly speak about the question, would not subsume the insight and would be very cumbersome to drag along. It also completely neutralizes innovation by rejecting any research that does not follow a particular dominant paradigm. These reviews completely ignore the fact that, in most cases, authors point out that the other framework that they use is actually well-accepted and useful. What’s going on here? It’s not your job as a reviewer to judge broad frameworks of the literature. If you’re not interested, don’t reject, have the courage to tell the editor that’s not a framework you want to review because you have no interest in it. Don’t ask authors to revise a framework that works well (or the semantics of their framework) because you have a taste against it. In the end, you should always ask yourself whether adding an ingredient to a model, even one that is among plausible first-order effects, really does alter the main economic trade-off or intuition; if not, this ingredient should NOT be part of the model. A theoretical model is about illustrating one force in a stylized setting, not about incorporating all first-order and second-order effects into a general messy theory.
  4. The revision does not obey.  This is not everyone but some reviewers have a dominance problem where they think they should impose their will on authors. In revisions, these reviewers will view as ‘word of truth’ anything they wrote down in a prior round and evaluate whether the author has completed ALL demands in a prior report. As a reviewer, don’t do that.  Even for the primary concerns, I need to check whether I was correct whether to ask something, and it can turn out that this was not to the benefit of the paper. Authors live in fear of pushing back against even a bad comments, because of this attitude of certain reviewers. I am confident enough to admit if I get something wrong and my role here as a reviewer is to make sure this does not contaminate the process or, worse, contaminate the published paper by adding mess to an otherwise clean argument. I have a process for this: if a point is not addressed, it should be stated in the author letter and one needs to be very, very careful, as a reviewer, to not repeat the point (which is very bad, intellectually deaf) but responding to what the authors say in the paper and in text, and why the concern remains or does not remain. Ask yourself whether this ingredient of the framework would remove an insight. Ask yourself why you think this other framework is indefensible and, yet, many good people use it to generate insight.

The only three reasons to have concerns

It’s not a competition: the field works better when everyone contributes and, ideally, we would not need to have rejections and would simply have projects being built until they meet the objectives for broad circulation. Yet, as the system is designed right now, there are places where an editor would need to know about some concerns. Whether these concerns are sufficient to advise rejection, or should be left to the authors to address is not something I can easily answer. However, what I can do is to speak about what I think are the ONLY reasons  why a paper might need more work. Caveat: this does not describe the editor’s problem, but remember that as a reviewer you are in charge of providing input to the editor, not of making the actual decision.

  1. No distance between assumptions and results. I’m sure that if I write this, at least some people will start putting this in their reviews, and I’ll start being blamed for rejection.. Anyways, I do strongly believe in this. Our work is theorist is not ‘to assume’ but to build bridges from assumptions to insights. If the result is what we assume, then there is no work being done and it all becomes a very matter-of-fact assertion about what the world is.  Evaluating the distance is not checking the length of an appendix, so don’t do that. To evaluate, there is a simple a test. Just write down the assumption in word, and write down the result, then try to argue it away and check how much the formalism has helped in making that argument a tight one that is not confusing to a listener.  Do it with someone that has not read the model yet and check for flaws in the verbal argument; check if the work done by the assumptions resolves this verbal logic flaws.
  2. Model and results have been done before. We can’t repeat the same model, same trade-off ad infinitum if this amounts to the same insight, even if the model is interesting. Now, an important detail here. We use a model as a building block to many arguments, and that’s fine, and we rely on the trade-off to have more things to say, that’s fine as well. What’s important here is to make sure that the same exact result within the same exact model is not already there in the literature. In that case, some vague notion that this is well-known is not okay: the claim must be supported by a clear reference to the result and a reference to a paper that derives that exact result. Say, showing that full-information is not optimal is true in many models, but whether the same exact form occurs for the same exact reasons is what we want to check.
  3. Unreadable mess. Among all the reasons that’s the very worst one. Theory is about explaining things, so what can we do when the paper is incomprehensible? Maybe the assumptions are unclear. Maybe the results are unclear: lots of cases in all directions that never get regrouped, or a lot of algebra that is ambiguous about effects all the way. I don’t like that at all as this loses the reason why we do theory. We make strong assumptions to get clean insight and so messy results are evidence that the assumptions are not well-built. A theory is not meant to show that things are complicated in the real world – we know that already – it is to show why and when things go one way or the other. We understand there may be cases, because context matters, but it’s important to have a sense when we should apply cases. A big mess where the implication depends on complicated unobservable forces or we only end up with a result that is so limited in scope that it does not answer the question, is not helpful. So, be general if you want, but don’t be so general that you don’t get any result.


A Primer for financial structural econometrics: the 2017 Mitsui Summer School

This week, I was fortunate to attend this year’s University of Michigan’s summer school on structural econometrics in financial economics, co-organized by Luke Taylor and Toni Whited (LW). There was a very dense web of knowledge to be gained from the camp, especially for structural newbies such as myself, and I would like to share a few of these insights with those of you who might (and should!) attend in future years, whether as an author or a consumer of this literature.

LW start their first lecture with the obvious question: (in my own words) why should we even bother with structural? This may seem obvious ex-post but I have to confess that, in graduate school, this question was never really asked because, duh, structural is about estimating economic models like an economist.  But LW offer a more detailed answer, by telling us about three applied benefits of structural work. A structural approach

  1. Estimates economic primitives, often in the form of institutional or behavioral characteristics that determine a choice. Because we have an explicit model of choice, we can claim that we are drawing from revealed preferences, a core axiom of economic analysis.
  2. Provides deep tests of a theory that go beyond a directional prediction. Are the magnitudes of a theory economically significant? Which of multiple theories contribute the most to explain a phenomenon? Which are suitable proxies to test the theory?
  3. Allows researchers to conduct counter-factuals, such as inferring potential consequences of a change in policy that was NOT observed in the empirical sample.

Let’s not hide it: structural work is hard! However, hard (to implement, or to do well..) does not mean obscure. In fact, if one were to summarize one big non-technical insight from the course, it is the following:

The quality of structural work is in its ability to make assumptions and how they extract economic insights from data sets (i.e., identification) transparent.

This, again, seems obvious ex-post but it is a template to organizing and reading papers in this area. Because it is new in some fields, I would like to cover next various means in which this general idea is implemented (of course, LW did not present this as a hard set of rules for any paper – so, reviewers, hold your rejection checklist – but as potential suggestions to organize’s one mind on a problem).

Step 1: Features of the data and model. The model picks up on a subset of presumed first-order effects in a setting of interest but LW strongly recommend, before any fitting exercise, to identify which features of the data are the object of interest AND whether these features of the data pin down the parameters of the model. How do we do this? By knowing one’s economic model very, very well: which characteristics of the model (a moment, a distribution, etc.) appear to change the most with parameters of the model, which do not? By plotting comparative statics over observable implications of the model, we can see which of these observables would be best suited to incorporate in the estimation.

Step 2: Being conscious of data-model limitations (LW use the term of not taking a model too seriously but I do not like that term because not taking a model seriously is as sinful as not taking data seriously and adds obscurity about what part of model is serious and which is not). Many data sets used in the social sciences have serious limitations, whether unobserved heterogeneneity, alternate economic forces or variables that are not quite what we think they are, to note a few examples. Building a consistent general model of all these limitations is not feasible at this stage of the science (and LW argue, might not even be that useful because it would soon turn into a black box). So what do we do? LW suggest to use empirical methods that are suited to these limitations. A few examples follow: use GMM or SMM for simpler models as they are more robust to misspecification and can target specific aspects of interest; control as much as possible for sources of variation outside of the economic model (e.g., scale variables to remove size effects, run a time-firm fixed effects regression on the variables and extract the residual).

Step 3: Recognize that the value of models is in understanding an economic trade-off. LW spent some time noting a misunderstanding in empirical research about model rejection (noting that this is still ongoing disagreement with some of their co-authors). Quote “All models are bad” and will be rejected (if one tries hard enough!). So, we learn little from whether or not we are able to reject a model and that a J test fails is not a deal-breaker. However, we learn a lot from analyzing where the model has failed, because it informs researchers on improvements to an approach. In fact, over-fixing a model by adding more structure to fit better can be counter-productive in that it turns it into a black box that no longer makes it clear what is or is not explained. LW suggest to look at moments that were not matched well and examine (as clearly as possible) why these aspects would not be explained well.

Step 4: LW note that this is not possible for any paper or any data set, but recommend using out-of-sample to provide external validity to structural estimates. Disclaimer: I do not like the term “out-of-sample” because out-of-sample refers to a sample that was not used in the estimation (different setting, different periods, etc.), but what they suggest is completely different. Semantics aside, their advice is very, very important for applied researcher. How do we suspend disbelief given some of the big questions answered by structural papers? We go back to the data and look at other implications of the model, seeing if they match aspects of the data for which the field may have strong-enough priors. Did the model do well to match moments that were matched in the estimation? Should we expect the coefficients to go one way or another in particular subsamples? Can we estimate the model in a subsample with an exogenous shock such that we strong prior about changes in parameters?

Another place where the summer school was highly successful is in giving a sense of where the literature stands at this point. With the increase in computing speeds and economic methods, there is an enormous opportunity in using these techniques in places where this was not possible even a decade in the past. LW note that only 2% of theoretical models have been estimated (I think it is a lot less..) and that entire areas are yet to be approaches: very little has yet been done on bankruptcy, household finance, labor finance, financial reporting.. and areas such as banking are receiving growing interest.

So, okay, you’re sold, what to do next? Well, first thing first, the summer school was intended primarily for Ph.D students and I would recommend to get in touch with one of the attendees and keep posted for next year’s announcement and program (here). Otherwise, the summer school had a great learning project that was done over the four-day course period. It is also useful, before going the summer school, to gain some working experience with Matlab or some other equivalent language. Although LW and their TAs shared their code and helped attendees along with their project, I find it very useful to come with some moderate programming experience to better appreciate the course.

Unregistered Report on the JAR 2017 Registered Reports Conference

On May 12th and 13th 2017, the Journal of Accounting Research conducted its annual conference as Registered Reports (RR). In RR, the review occurs primarily at the research design stage, before results are known. RR protect science against practices that correlate to acceptance likelihood but bias test statistics in ways that cannot be rigorously controlled for, e.g., changing hypotheses, data or specifications in the face of unexpected results or non-results, see below from the discussion paper by Professor C. Chambers.


Overall, the experience has been a great success and provides a template for adoption as a separate submission path in addition to existing channels. In this entry, I will speak in broad terms of what I have learnt about RR in accounting research as well as some specific papers presented at the conference, all of which make a significant contribution to a variety of questions. Then, I’ll cover possible improvements to put on the table if editors want to make this model sustainable. Lastly, I will engage some bigger issues that RR connects to and the leadership by example that JAR is setting for accounting and non-accounting journals.

RR teach us that many questions are very hard to answer without pre-commitment to a research design, in cases where the universe of possible tests is unknown but likely large. This is true for experimental studies, run 100 experiments about various implications of a theory, for archival, correlate an accounting event to 100 possible outcome variables, and even for theory, try 100 assumptions and keep the counter-intuitive implications. Within this state of affairs, the field has become skeptical on virtually any result that comes out of the major journals, leading to a situation of research relativism that is entirely unacceptable. RR offer the means for authors to focus on sounds methods, without fearing their findings. It is therefore to the benefit of the community and to the benefit of the authors.

A secondary insight from the conference is that non-results can be the stars of the show. Many of the studies were well-motivated, intuitive and one would have expected some effect – indeed, the studies went after hypotheses widely-believed to be true by researchers studying them. Side by side to analyses of power, the non-results are prior-changing breakthroughs. They also reveal that knowing what we don’t know is the prior step to seeding the next papers.

The opening paper by Y. Ertimur, C. Rawson, J. Roger and S. Zechman is a fine example of RR as a tool to encourage risky data collection, that is, data that may not pan out with an easy-to-interpret or consistent message. ERRZ hand-collected evidence about the inactivity spells of CEOs, a common occurrence with external hires. I would have conjectized, for the CEO labor market, that high-ability CEOs come with more sensitive knowledge, implying greater likelihood of non-compete agreements, longer spells, and higher subsequent pay as a result of ability differentials and compensation for the gap. By contrast, according to conventional labor market unemployment theories, longer gap may indicate CEOs seeking a valid match, where a longer unemployment spell may publicly indicate lower innate ability and may even directly erode it. Surprisingly enough, ERRZ find results consistent with the conventional theory, thus showing that CEO gaps may not be conceptually that different from unemployment gaps. Perhaps, this is a first step toward unifying the study of CEO labor market into the many robust insights from labor market theories with search frictions.

The paper by K. Allee, M. DeAngelis and J. Moon use the RR format for an entirely different purpose. A criticism of the text analysis literature is that the degrees of freedom for cranking out (yet another) measure of information in text document is too large to separate data mining from truth. What ADM do is to construct a new measure on sound principles, before its relation to other variables are known. They define scriptability as a score capturing the difficulty in operationalizing machine reading of common accounting documents. Interestingly, the firm characteristics that were conjectured to correlate to scriptability do so remarkably well, a non-trivial feat in an RR setting. Against all expectations, however, scriptability did not seem to have a clear connection to the mechanisms through which information comes into price. Associations with price discovery, bid-ask spreads and volumes, are all over – often insignificant or even with the wrong sign depending on the document. The paper makes a strong case that machine reading may not have first-order consequences on market price. Beyond the non-result, the paper shows us how the construction of a measure, for settings in which many adhoc researcher choices are required, can gain additional value and credibility when conducted in the context of RR.

Two papers went after a field experiment. Field experiments let researchers observe real subjects involved in their habitual decisions, but with the benefit of experimental control samples. S. Li and T. Sandino ran a field experiment testing the effect of information sharing systems on creativity. Most of the literature provides experiments that supports the conventional view that creativity can be affected by intervention, but there is valid concern that the experiments that did not support this view were rejected or not written into a paper. LS created an app in which a sample of storekeepers in India could share promotional posters that they designed, and learn from posters submitted by others.  The results? The experimental treatment failed to elicit much change in the quality of posters submitted by storekeepers. Perhaps the benefit of such information-sharing systems are low relative to what people can learn on their own from direct physical interactions. This result may not be unique to India, and may extend to creative settings in which interpersonal interactions already do a good job at sharing information. It puts a caveat on current beliefs that social media will dramatically increase flows of creative information.

The next field experiment, by  D. Bernard, N. Cade and F. Hodge is a fascinating example of a creative research design to answer a controversial question: does giving away shares help the company sell products? On the one hand, many researchers believe that individuals dramatically exaggerate the effect of their actions – after all, how many of us vote despite the fact that our chances of being pivotal are virtually zero. On the other hand, rational choice advocates argue that, for such obvious decision problems, biases are unlikely to change customer purchases. I wish the reader to stop here and make a bet on which side wins the argument. BCH pre-commit to give a $20 share of ownership of stocks in Starbucks and then follow up on Starbucks purchases in the treatment samples against a control sample given a $20 share of ownership in a different company. The results? No effect is found. Hence, a non-targeted IPO might do very little good on company sales.

The paper was also an interesting learning experience because there was somewhat of a blowup, midway into the paper, as authors conducted very compelling analyses that did suggest the existence of an effect. But these analyses were unplanned, and conducted ex-post in response to the absence of a main effect. This may seem obviously misled, but let’s review what these analyses were. Again, they were very compelling: companies distribute shares to customers and even IPO would be over-subscribed by existing clients. Hence, a natural test would be to select subjects who drink coffee. In this subsample, it was found that stock ownership increases purchases.  But these analyses were only conducted because the main effect was not found, opening the question as to which other reasonable analyses would have been in the universe of possible tests. The message here is clear “report but do not conclude”; the door is wide open to further work.

The paper by H. Eyring and V.G. Narayanan is another take on the degree-of-freedom problem in some of the question we ask. EN were given permission to experiment around the information displayed to takers of Harvard’s online courses, showing either means or upper quartiles of the performance distribution, and measuring whether student achievements improved. The issues with such a question is that one could have reported various quantiles, or report it in different ways, until some effect were observed. Among the papers, this was the one with the strongest confirmation, as it was clear that the reporting of a bar that could be reasonably met increased performance, while a bar that could not be reasonably met or was too low reduced performance or had no effect.

We can debate ad infinitum as to whether the same insight can be applied to settings in which financial incentives are also a choice, but (i) there are compensation settings in which the employees cannot primarily rewarded based on financial incentives (e.g., unionized positions, bureaucracies), and (ii) the education application is, even on its own, a very important application that can be directly used at very little cost by the many organizations that provide online learning tools. This is one of the rare studies in our area that immediately translates into concrete useful applications, and the RR has given additional validity to its insights.

Surprisingly, there was only one experimental paper at the conference. Among the entire set, this is the paper that had the most unambiguously clear non-result – a fact that may be a benefit of clear laboratory data where opportunities to seek other variables are limited. Z. Kowaleski, B. Mayhew and A. Tegeler investigate a setting where interactions between audit and non-audit services could create an environment conducive to collusion between the firm and its auditor. They conduct a repeated game and examine whether the equilibrium that is being played seems to shift toward more or less manipulation in a group with both audit and non-audit services. The results? Not at all. The average occurrence of misreporting and auditor enforcement appears to be the same. This also strikes against a widely-held view in the post-Enron world, namely, that auditors should be strictly regulated because most of their activities will contaminate the quality of audits. The paper speaks to the possibility of collusion in a repeated game even without non-audit services and how the consulting culture does not appear to be a factor moving the equilibrium.

The final paper was an outlier. It was a fundamental contribution to accounting knowledge but also, not the type of paper that starts with a well-defined hypothesis to reject, or even strong researcher priors. Here, the RR format nevertheless played a major role by eliciting a huge effort in data collection. L. Hail, A. Tahoun and C. Wang go back to old news archives in countries all around the world, up to two hundred years in the past when possible, to search for occurrence of scandals and enforcement actions. This was meant to provide descriptive evidence about the lead-lag relation between regulation and scandals, a fact that was, prior, unknown. This is critical evidence that speaks to the fundamental question: what is the positive model for what regulators do? It’s an odd duck because, as one of a  journal editors noted after the talk, this was not a paper that was after some risky results and so the reason for pre-committing to analyses, versus a direct exploratory review of the data was not self-evident. The paper did inform us that journal may consider an alternate format with pre-acceptance of data collections that is of general interest but resource-consuming, without some of the strict requirements in the full RR.

In a follow-up panel session, Chris Chambers, a world expert in operationalizing RR, noted that the accounting profession has taken the format as its own, adapting it to the needs and challenges of the field. Also, all involved authors strongly supported opening a path in journals for RR, as one possible submission format. I’d like to share here some additional personal thoughts about what might be part of future iterations of this idea.

Methodology. Most of the review process was focused on data and experimental design questions, probably because these are the first-order aspects to be considered.  In terms of methods, most studies fell back on linear regression as the main tool. There was very little Bayesian analysis (the workhorse for these methods in the natural sciences) or even any of the economic analysis tool that are standard in the economic sciences, e.g., estimation of preferences or some choices made within the data set. That’s too bad, because good data is the main obstacle to methodologies that are rich in insights, so the transparency in data in RR is a complement to stepping beyond interpreting conditional means. In fact, RR that focus on analyses methods is perfectly suited to respond to theoretical overfitting, since there is a large universe of statistical or economic models suited to a data set. I hope future iterations of the RR will involve using the format as a stepping board for using the cutting-edge methods to draw information from data.

Theory. A striking feature of most of the seven studies is that they were all based on an relevant hypothesis, none of them was based on a theory. The difference between hypothesis and theory, is that a theory is a broad set of concepts that will lead to multiple hypotheses. Even if we find that say, social media increases creativity, or that a particular game induces collusion, it is not clear which theory is being tested or what theory led to these hypotheses. Starting from a primitive theory is difficult, but the RR offers an ambitious agenda to go after research designs that will falsify theories by looking jointly at the many hypotheses they imply.

Power. C. Chambers also noted in the panel that power analysis is part of RR, but not necessarily a feasible first step in the absence of an easy formulation of a Bayesian prior. This echoed many concerns by the audience that accounting problems are relatively unstructured and do not lend themselves to analyses of power, relative to fields with more structured knowledge of disturbances. Measuring power requires assumptions. Nevertheless, I find counter-intuitive the principle that not making assumptions and not measuring power should dominate a best-effort estimate of power, one that is transparent in the assumptions that have been made and, in the worst cases, offer alternatives. Without power, many of us were unable to set confidence intervals on the non-result and some of the potential value of RR is not realized.

Conclusions. C. Chambers also made a point, toward the end, that unplanned results contaminate the RR and, while helpful, should be separated in the conclusions. It was striking that authors, not referees, felt compelled not to stop at non-results, and worked very hard to provide supplementary analyses. The quest for meaning is, perhaps, not surprising but does show us, once again, why the RR format is so desirable. For the word RR not to lose what it conveys, special care should be taken to separate these supplementary parts from the main conclusion, perhaps by omitting altogether from the abstract and requiring findings to be outlined and in a different paragraph.

To conclude this blog, let us keep in mind that the RR experience, conducted here within the strictest rules by the editors, can teach us about mixed models that borrow from RR and a conventional review process.  C. Chambers mentioned offering authors a result-blind submission path, in which authors submit their paper without results. This is different from RR in that the analysis has been conducted and is known by the author, and does have caveats relative to RR. However, it is a path to re-open important non-results or unexpected results that exist but have been buried by the publishing process. It’s also an opportunity to work hand-in-hand with other follow-up RR, or push the community toward the replications that need to be conducted.

As to the broader message, RR teach us about the value of transparency, namely, transparency about research protocols. The transparency is not just to outsiders but also to ourselves, and it makes us realize how the process can distort significance levels.

I will add three short notes, which are probably slightly on the more controversial side but could be nevertheless items to think about.

First, part of the skepticism in research has led to p-values that are probably too stringent for a strict RR report. That we should require a 10% confidence for a result is fine when authors are expected to pick among many regressions, but a lower confidence might speak a lot to noisy problems with large economic significance. E.g., should we only accept global warming if we get to 10% significance, or should we act before this? The answer may be the latter if we know that the research has been conducted without bias. Perhaps more discussion is needed about acceptable p-values and the format of reporting significance stars should be changed.

Second, are we ready for universal adoption of the RR as the method of empirical analysis? Absolutely not! Most of what we do is exploratory in nature as we don’t have good theories and data is setting specific and contains sources of noise that are hard to structure. RR might inform us that the current focus on “testing” is excessive as part of our exploration work, and we should instead rethink conventional studies as helping design better theories – once this is done, RR will be the format for the actual testing.

Third, the RR is part of a broader effort, also engaged by JAR to offer transparency in how the research has been conducted. With the current policy, which is currently only adopted by 9 journals in the Financial Times 50 list (of which only Journal of Finance in financial economics), authors must share their code, see at the end of this post for a list. Since we do not expect most papers to be RR, transparency approach gives to the community tools to evaluate the robustness to alternate specifications or hypotheses, and is the standard companion to RR when RR are not possible.

We shall all be looking ahead as to a wider adoption of the RR guidelines in accounting, so that accounting takes its natural role as a leader in promoting transparency.
Appendix: data policy in the Financial Times 50


Journal (with link to policy) Code sharing Non-proprietary data sharing
 American Economic Review Yes Yes
 Econometrica Yes Yes
 Journal of Accounting Research Yes No
 Journal of Applied Psychology Yes No
 Journal of Finance Yes No
 Journal of Political Economy Yes Yes
 Marketing Science Yes Yes
 Quarterly Journal of Economics Yes Yes
 Review of Economic Studies Yes Yes
 Academy of Management Journal No No
 Academy of Management Review No No
 Accounting, Organizations and Society No No
 Administrative Science Quarterly No No
 Contemporary Accounting Research No No
 Entrepreneurship Theory and Practice No No
 Harvard Business Review No No
 Human Relations No No
 Human Resource Management No No
 Information Systems Research No No
 Journal of Accounting and Economics No No
 Journal of Business Ethics No No
 Journal of Business Venturing No No
 Journal of Consumer Psychology No No
 Journal of Consumer Research No No
 Journal of Financial and Quantitative Analysis No No
 Journal of Financial Economics No No
 Journal of International Business Studies No No
 Journal of Management No No
 Journal of Management Information Systems No No
 Journal of Management Studies No No
 Journal of Marketing No No
 Journal of Marketing Research No No
 Journal of Operations Management No No
 Journal of the Academy of Marketing Science No No
 Management Science No No
 Manufacturing and Service Operations Management No No
 MIS Quarterly No No
 Operations Research No No
 Organization Science No No
 Organization Studies No No
 Organizational Behavior and Human Decision Processes No No
 Production and Operations Management No No
 Research Policy No No
 Review of Accounting Studies No No
 Review of Finance No No
 Review of Financial Studies No No
 Sloan Management Review No No
 Strategic Entrepreneurship Journal No No
 Strategic Management Journal No No
 The Accounting Review No No
 Total 9 6

A guide to fancy living in NYC by the cost-conscious academic

Every year, a few thousand academics come to NYC to join one of the many universities of the City. Having been around for a little over five years, let me share a few tips for the newcomers in the lifestyle I’d best describe as ‘Bohemian Nerd,’ or someone who is conscious about optimizing comfort, while staying true to the diversity and open-mindedness of what the City has to offer.

Housing. Few newcomers consider Uptown as a location of choice in Manhattan, yet, north of Central park (110th-125th), on the stops of the Red, Blue or Green lines have become some of the best locations in Manhattan for active young singles or couples. On the west, between Amsterdam and Malcolm X, Morningside Park offers impressive views of Harlem (on Morningside Drive), and offers combinations of playgrounds for kids (on 123rd or 116th) and green open spaces below (on 110th). Frederick Douglass, between 110th and 125th, with its many cafes and restaurants (for example, Harlem Tavern) is a great place to have brunch or dinner after a long day of work, and easily accessible via the A/B/C lines. One can pick from one of the many high-end condos or more typical apartments in townhouses on the cross streets (my preferred choice since NYC can be noisy on the main avenues). The East side is a little cheaper, but home to both Wholefoods (125th) and a Costco, as well as many stores on 125th. North Central Park has a large public pool open all summer and a skating rink in the winter and is a great place to relax or meditate. Note that these areas are diverse and extremely safe, in fact as safe as any other area in Manhattan (which, in my experience, is even safer than an already fairly safe city such as Paris). We’ve crossed the area at late night (1am) and, as to Morningside Park, see people walking their dog at late hours – we’ve never had any problem. The further west area Riverside-Amsterdam from 105th to 125th, has nice park areas by the Hudson and a beautiful walk, but is slightly underwhelming as to both character and things to do and has become slightly on the expensive side. If you live there, you’ll have to go down to the upper west, around 96th.

What about other special areas such as Park Slope (Brooklyn), Upper East, Soho/Tribeca? Park Slope is very nice (with the park near by) but far if you need to commute to Manhattan, perhaps a good compromise for larger families. Upper East is very expensive, count about double relative to Uptown. Soho/Tribeca is, hands down, the most delightful area of NYC with its low-rises and nice shops but it can be noisy and, for the better parts, can be even more expensive than Upper East. It’s just easier to travel there rather than live there every day, especially considering that it does not have the benefits of Central Park nearby. The main advantage of these three options is that public schools are top-level, especially for families of 2+ children (more on this later on). The upper west is always a safe bet, with a greater focus on the busy executive and a lesser focus on character. Lastly, many academics live in the North suburbs (Scarsdale/Westchester), but count between 1 hour and 1 hour and a half of commute. This is the residential experience, except at a higher price point than most other cities and I do not advise it since commute can interfere with being at the office every day and adds to fatigue. It may be, nevertheless, an option to move there with 2+ children in school age (but only at that moment!) since private schools in Manhattan can be expensive, more on this later on.


Food/Grocery shopping. Food shopping is a complex task in NYC because there is no central Walmart where one could pick up everything and be done with it. Every place has its own specialty and is not-so-good for the rest. Trader Joe (around the city) presents one of the greatest generalist compromises with high quality at good price. But let me improve the experience with specialized better ideas. Zabars is the best place in NYC for inexpensive and decent cheese (try their Manchego, Fourme d’Ambert and Comte), fresh pasta and a delicious Scottish salmon. Street vendors – all over the place – are the best place to get fruit and vegies at low prices (about half Wholefoods), although quality can be variable. Avoid the pricey local supermarkets of the fancy neighborhoods which give you low quality at high prices, but I recommend Best Yet market for a large supermarket with great selection of vegetables at very low prices. If this is convenient, Costco (East side) and BJ (Bronx) have very nice groceries. Many new-yorkers go to Fairway. It presents very well but is incredibly pricey – however, I found no better alternative when it comes to seafood and the olive oil is quite nice as well there as well.

Let’s not underestimate online grocery shopping, which is a must-use in NYC. At the high-end, Freshdirect offers a zero-cost delivery subscription (for orders starting at $30+) and has great food, especially Milk, meat and seafood. Prices are not competitive for fruits and vegetables. They also have a nice selection of other things, such as their Brioche or chocolate croissant. They also have competitive prices for bottled water (such as Perrier). For anything that is durable, such as cans, rice or even other household stuff, I found no better than with free shipping from $50+ and half the price of a regular supermarket.

For wine lovers, what not to do is to pick up wine store. These are well presented but are expensive and usually mix up low-price but near-undrinkable bottles with fine wine at heavy markups. Alcohol in general should be bought online in New York. I strongly recommend Wine Advocate’s weekly wine buys for unbeatable 90+ bottles starting as low as $15. The only downside is that stores vary and this takes a little time to set up. I personally like the site, which sells a new bottle each day, at top prices and quite often rates in the mid 90s. What I like the most about the site is that it proposes bottles (no subscription needed though) so this saves the search costs. If you want to try out and you don’t have a friend who uses it yet, my code is here (they will also credit you with $10 in wine credits).

Moving around. Subway in NYC is a mixed experience. The North-South line arrangement and the ability to move between express and locals makes the subway cover a lot of distance in reasonable time. You can easily traverse half the city in less than 30 minutes, and sometimes only 20 minutes on some lines, so the City allows faster commuting than most other locations. In addition, there are many, many subways, and wait time is minimum. However, there are some big problems. The subway is very crowded during rush hour and it’s not unusual to be squished against a side; not great unless one is into out-of-body meditation. In addition, it is common for subways to run strange. Suddenly a local subway would turn express and let you go 40 streets north of your stop, or some problem of the line will force you to rerout via another line. If one has a time commitment (like picking up children or a meeting), the subway is completely unreliable and many times I had catch a cab.

But are there any alternative? Cabs are very easy to get but expensive and so is Uber (Lyft has now become much more expensive than Uber in NYC), if about 30% less than regular cabs. Bus is atrocious. Heavy traffic and stops at (almost) any traffic light make a bus ride at double the time of subway – of course, for that reason, buses are not too crowded. So it things look quite bad at this point.

Fortunately, a few new services have completely changed the situation recently. The service Via now offers free car-pooling for $69 a week (less for monthly pass), as many times as desired or a flat $6 pay-as-you-go cost. Besides, it’s pre-tax money so the comparable cost to a cab is misleading. Cars are generally large SUVs (Toyota Highlander) where it’s easy to get work done during the trip and very, very comfortable. Because of heavy traffic, it is longer than subway during rush hour (count 1.5x) but it’s great option if you can do some work – like read a paper or work on a laptop in the car. Also, one thing I love about Via is that you cannot miss your ride; cars are almost always in the same brand (unlike other services) and they tell you at what corner to pick it up. If you sign up, ask a friend if he has an account and you can put his code and give him and receive some credit (currently $10). Uber also launched a pool service, but it’s inferior to Via – good prices only work at particular rush hours and rides are not as comfy as Via.

The Arts. Academics and students must use the service, which is a service that can only be used by people in education. This works a little like ticket booth, the last minute service to get discount tickets to theater but with three big differences: about half cheaper (around $40 for a prime show, a lot less for off-Broadway), one books one or two weeks in advance and, the most important, seats are typically spectacular as long as the booking is not for too many people (2 or 3 is ideal). tdf does not have a huge inventory of shows, but it is quite nice and is renewed. We’ve seen some very nice theater (like top productions of Sartre or Shakespeare) as well as new musicals such as Rocky or Spiderman. Especially, it’s easy to miss out on the non-musical theater scene which is quite good in NYC, and tdf is the place to go to keep current on what’s there.

NYC has, of course, great museums. Remember that many museums are free in NYC so you can donate but it’s up to you. The Museum of Natural History is a great place for families and the special exhibits are always great, although these are paid extra. The MET and Moma need no introduction of course. For those interested in museums, I strongly advise to use Chase bank, which offers, for free, a pass for its private clients at many of the museums that are not free, such as the Guggenheim for example.  For museums that are technically free, the pass gives preferential access without line and discounted prices for the activities as a corporate member. Also for families, day trips to the Liberty science center and New York Hall of Science are worth it, but I’d advise to first take a Via to Brooklyn to cut on the cost – in  both cases, commuting via public transport in a nightmare.
Interestingly, buying art occurs in the most unexpected of places: on the street by a museum among the tourists (especially met), a few painters sell beautiful art pieces. The same pieces retail for a much higher price on the internet or from their studio (usually Brooklyn) but there is more choice then. So, it’s probably the best option to first discover a painter’s art in the street, and then visit their studio for more choice even if it comes at a steeper price – usually x3-x5. Of course, the pieces in the studio are usually more distinctive motifs.

Restaurants. Little known is that the truck street food is one of the best in the country. The typical truck has amazing lamb with rice and vegies, italian sausage and, for the vegetarians, falafel that I’ve rarely seen matched, all of this for around $5. Trucks can be found at most street corners.

People say that restaurants are great in NYC, but I’ve rarely found this to be true. Most restaurant food, even when it is in fancy room, is very simple and does not strike you as more prepared than what you could easily do at home, plus the extra price. A few caveats to that. I’ve always been pleasantly surprised with oysters in NYC, and groupon offers reasonable deals (at full price, it’s horrible though). Some staples of NYC are always great, so you can get a nice Reuben even at a touristy place and not go wrong; the restaurant burgers are not too expensive and quite nice in almost all restaurants.  For the everyday, the chain Chipotle is now widely distributed in NYC and presents one of the best picks in the City (in fact, taste-wise it beats most of the fancy restaurants).

A slightly bohemian American with music to try out is the Harlem Tavern on 116th, especially for Saturday/Sunday brunch.  For the more exotic, Jin Ramen on 125th offers the authentic Ramen experience, delicious! Another great pick that never disappoints is La Tartine in Soho – an affordable French that beats in quality any other fancy restaurant – and you can bring your own wine, but beware that this is a well-known secret and wait lines can be very, very long during peak times if you don’t come early.

Shopping. Shopping is special in NYC but one needs to know where to go. Surprisingly, it’s not at the grand names on Madison that the best shopping is but at the discount stores such as Marshall, TJ Maxx or Century 21. Because of their location, these stores receive the brand name inventory of each season and resell it at extremely heavy discounts, from half to a quarter of the price. Note that these can be top makers, it’s no unusual to find a YSL, Gucci or a D&G shirt or tuxedo there. They also offer tons of other things such as toys or perfume, also at discounted prices.

There are a few outlets malls further outside of the city. An example is Tanger Outlet at Riverhead in Long Island. It’s very extensive and good to visit, and can be combined with a day at the Hampton’s or at the water park Splish Splash nearby. Prices there are better than on the web or in stores in the city, but, in the end, it’s more about the experience. Note that one goes there via train, and then there are taxis or buses at the station that will drive you anywhere in the area.

Families. NYC is great for families but one needs a few insights about what to do. For the little one, the Swedish Cottage on Central park has nice Marionette shows that change along the year. For all ages, I recommend the Blue Man group show which is also on tdf most weeks. I’ve been very disappointed about most off-broadway productions for families.

During the week-end, kids love Coney Island and there are other things to do for grow-ups as well. The beach itself is just okay, and the water tends to be a little cold for extended swimming – besides, it is a bit crowded. However, Coney Island is accessible by subway with almost no connection – so it’s very practical. The boardwalk is nice and spacey, and there is music and a nice atmosphere to it. There are also a few parks with rides that cater to all ages, especially little ones that have a ton of fun every time (and pricing is very reasonable). Lesser known, if one moves away from the beach, there are some cafes with a nice caribean feel to them and perfect for cocktails in the evening. For the high-end experience, Sandy Hook in New Jersey is a set of very beautiful beaches in a national park. To get there, one needs to get the ferry (at many locations) and it takes about an hour to get to a complete out-of-city experience where most is pristine. The ferry ride is quite impressive as well and never gets old. Expect however things to be extremely windy which can be unpleasant if the day is cold.

Snow and ice are great in the winter. Central park has various skating rinks (Lasker in the North for example is a skating rink in the winter and  a outdoor swimming pool in the summer). There are also many places in the parks where families to ‘hit the slopes,’ and you can use a regular cardboard to slide or get a better looking one at any sports store. Last but not least, for kids, skiing is fairly easy from NYC. Even if you don’t have a car, take the Harlem 125th train Metro North station, and head for Patterson – about 1h40 minutes and very easy. From there, a free shuttle will go to Thunder Ridge (5 more minutes), a small ski resort that’s perfect for kids to learn skiing with a long green slope that comes from the top to the bottom of the main hill.  Otherwise, Mountain Creek in New Jersey  is the closest (and a little more interesting for adults) but requires driving, about 1h20 minutes door to door.

As to other activities, the Y has many locations all over the city and can offer classes for all ages in things that go from hip-hop to swimming or soccer. These classes are excellent and reasonably priced. It’s usually possible to arrange it during a week-end and back-to-back in order to have a sequence of things to do. Note that one does need to be a member to sign up for an activity at the Y.

Another challenge for families is schools. On the positive, NYC has implemented a free public pre-K program so that’s something to take advantage of. This program is open to public schools as well as many private programs. Going to K and beyond, schools achievements can be unequal but people tend to overweight the importance of a school district because, I can only assume, there is a quantitative measure publicly available on the site of the Department of Education. I’ll offer some thoughts about this.

Some public schools have an excellent reputation through their specialized dual-language programs, see for example PS. 84 Lilian Weber for a French or Spanish dual program. These programs are designed to combine native speakers and non-native speakers and teach in both languages. They are separate from the school and benefit from collecting the most adventurous families. For tiger moms and dads, NYC has developed a gifted and talented program which includes a test exam (yes, a test for a 5-year old!), so this requires some active training by parents – not my cup of tea. Further, gifted and talented programs give a lot of homework, as we have heard from other parents whose kids joined these programs. Perhaps there is a great upside to this in the future though. Another option is private schools, and it is not widely-known that there are some excellent but not excessively pricey private schools run by religious institutions. The diocese has very good schools with excellent traditional education (e.g., Corpus Christi on 121st); other denominations have great schools as well, as I’ve heard very satisfied parents. Note that these schools are very respectful of diversity and do not impose the school’s belief on children or parents. Lastly, there are some amazing private schools that give you some of the best teachers and activities, for small families of one or two, this usually cannot go wrong. For example, the School at Columbia is Columbia’s university magnet school with experience-based education and was recently handed a distinction for science education, in person by president Obama.

A few more remarks. A great thing about the City is the number and variety of playgrounds, and it’s a pleasure to vary across the many options which can often be at walking distance. During the summer, most playgrounds have water fountains that kids absolutely. I have a few favorites. The Tar Family playground is one of the best with a beautiful pyramid, water fountains and lots of activities, including a sand area. It’s also next to large open spaces in Central park to enjoy a relaxing day. The Hecksher playground is the largest in NYC, and has lots of space and nearly everything – unfortunately, kids love to play in the rocky area and this is not for the faint of heart parent. Third, the Billy Johnson playground is small but it has the best slide of central park and is right next to the Central Park Zoo, a repeat must-see activity.

That’s all I have. If you want to share more tips, feel free to comment below!

How empiricists should read theory

I was at a conference a few days ago and, after a theory speaker had finished his talk, a senior empirical faculty turned to me looking angry and said “you know, now that you’ve got tenure, you should write a paper about how empiricists should read theory.”

The problem is, I don’t know myself how theorists should read theory – so, next best thing to writing a paper about something you don’t know – I’ll jot down a few thoughts in a blog and hope it all comes together at the end.

Thought 1: Read more than the title

Make a prediction that can be tested in a data set: chances are more than one theory could deliver this prediction.

I’ll take a running example: consider the confirmation theory of Dye (1983) and Gigler and Hemmer (1999). The theory predicts that a verifiable message (e.g., earnings) makes otherwise unverifiable communication (e.g., forecasts) credible. But, this can only occur in the very different theory of Einhorn and Ziv (2008), a world where disclosures are credible by assumption, but the manager can pretend to be uninformed and withhold information. Realized earnings, in this model, inform investors about the manager’s information endowment.

Theories are about mechanisms or the means to get to a prediction, and within each paper, there is enough information to test which mechanism is operating. So a good reading of theory should identify the empirical content of the mechanisms (to be tested as well), not just the end prediction. 

Returning to our running example, under confirmation theory, credibility is assured by writing a contract that punishes the manager for missing a forecast, controlling for earnings. Do we see this in the sample? Under dynamic disclosure theory, we should see that low earnings, indicative that the manager probably withheld strategically, should cause more disclosure in the next period, a very different time-series prediction.

Thought 2: Normative research provides guidance about what to study empirically, and why.

An old tradition in accounting – which by the way is no longer mainstream even in empirical research – is to view normative research (“how things ought to be”) as suspect and probably wrong. Recall that this came to be, over the 70s (see Demski 1973) because a number of individuals were dishing out accounting knowledge in the form of religious edicts, and this had to stop for serious scientific research to begin.

The majority of theory work has clear normative implications, and some of it does not contain much in terms of testable predictions, so having appreciation for what it is trying to do is important.

To talk about what normative research is, I’ll make a quick parenthesis along STEM, Science-Technology-Engineering-Mathematics. To over-simplify, let’s identify Science as the scientific method and take testability as a core principle. Let’s also set aside Technology and Mathematics since these are tools that are rare. However, Engineering is different. If I want to build a bridge, I make plans to do it based on sound principles validated by science. I’m not building a bridge only for the purpose of testing and, if I were to require testability always, no first bridge would ever be built.

This is what normative research is all about, namely, make plans for improvements that do not yet exist. Normative research requires good assumptions as inputs, hopefully assumptions that we believe have been tested. This is important: after all, without engineering/normative, then all the scientific knowledge we could accumulate would have no means of creating better outcomes.

So, how should empiricists consume normative theory? Normative theory is the natural end-point of the knowledge we create but it makes many assumptions. Knowing whether these assumptions are descriptive or not is the realm of positive theory and empirical research. Therefore, normative theory gives guidance to empiricists over what meaningful assumptions should be tested. It makes the empirical exercise relevant.

Thought 3: Find the selected math that summarizes.

Ten years ago, I remember attending a week-long seminar by an accounting theorist. I was shocked, rather than tell us about all the great things he had found, he started his talk by saying that “math is an unnecessary evil” and, then, looking at one particular faculty in the crowd, said they should shut down one top econ journal this faculty had published in (for, apparently, having published a few papers there, he was now its representative).

Anecdote aside, let’s ask the question: why do theorists use symbols when they do their work? Is it something that’s back-end material to be entirely ignored by empiricists?

Math in argumentation has many purposes, and one of them being to support a tight logical argument. For example, how often is it than one is lost in a wordy hypothesis where everything seems to float in the air, and multiple logics seem to be operating?

But is math only useful as a method of validity, so that empirical readers may safely ignore the math once a referee or journal has verified its correctness? To answer this question, consider absorbing Adam Smith’s Wealth of Nations, and compare it to an undergrad micro textbook treatment of the welfare theorems: which one is easier?

Empiricists can learn from math in a study because a few equations can provide a concise simple summary of otherwise difficult trade-offs. You don’t need to bury yourself in the appendix, or even to follow every main argument to get to these equations. So, to use a theory papers, remember those few equations that summarize the assumptions and main results; this is often much easier than remembering convoluted steps and implications of a wordy logic.

Thought 4: Take a theory seriously, not literally.

Any theory is a simplification of reality, it does not aim to be descriptive of everything and, in social sciences, the most successful theories only get to first-order effects. We can’t take theories literally to be exact representations but we can take a theory seriously enough so that it may explain empirical behavior.

Unfortunately, many empiricists  view theory as merely motivational. By motivational, I mean that a theory is here to introduce a topic but nothing more, or that it is part of an enormous bundle of theories that have nothing to do with one another and together deliver predictions – some of which may occur in one of the theory and not the other, in both or in none.

So taking one theory seriously really means the following: let’s shut down for a moment all other theories, and assume that the world that we see in a data set has been generated by this theory (and only this theory). Is what I see in line with what the theory says? Is the theory complete enough to speak to all the empirical facts I want to study?

Doing this requires some specialization: if one is serious about a theory, one needs to know it very well – but the payoff is to be able to test multiple predictions of something parsimonious and clear. Few studies can claim this type of transparency.

What about alternative explanations? The good thing about it is that, once we are binded to multiple deep tests of a theory (being serious about it, that is!), most alternative theories will often fall as being naturally ruled out by these tests without having to design specific extra tests.

Thought 5: Theories tell you about (unobserved) exogenous variation in observational samples.

Some disagree, but I like the idea of theory as a poor man’s substitute to unavailable data. If we lived in a world where we could experiment anything instantly and at no cost, we would not need any theory to make better things – we would simply proceed by an infinite set of free experimentation.

I think that’s the deep problem with advocates of the self-called ‘credibility’ revolution who believe that experimentation is required to solve any problem. Yes, experimentation is better but isn’t free or always feasible – so, wherever it is missing, we need to rely on theory.

The greatest example of this is observational data. I was at a conference at Kellogg law and a statistician complained that most research designs in the social sciences would not meet the standards of medical science – he was quite critical of observational data outside of a controlled experiment and believed that instrumental variable methods (even assuming the exclusion held) had serious statistical flaws. I think his point of view was that only carefully planned experimentation could meet the standard of proof, noting that economists always asked him to “Believe..”

Yes, indeed, theory does require to believe. Believing makes things less credible, but there is often no alternate course. But let’s be more precise now, what does ‘believe’ mean in an observational design? Theory is a statement from a source of exogenous (but often unobservable variation) and how this exogenous variation can cause outcomes. The theoretical exercise accepts we need to make assumptions, but requires that these assumptions be clear and logically used – this is the least we can do.

What does the theory conjecture is the source of exogenous variation? While it does not offer certainties, this can inform the empirical design about what assumptions are being made, and (within the theory) how these assumptions are used in a consistent manner.

The evil word, here, is of course endogeneity. Endogeneity is a fundamental characteristic of any observational study. Theory does not solve the endogeneity, if we mean, by solving, providing the same level of confidence as if there had been an experiment. However, theory does clarify a plausible mechanism for the endogenous relationship between variables, and links them to an exogenous source. Theory clarifies the assumed source of (unobserved) exogenous variation.

What’s in a co-author?

I wanted to enter this entry about a delicate issue, but one that is important to many of us: namely, what type of input does it take to be a co-author? This issue is important because many Ph.D students and junior faculty feel obliged to include senior co-authors in their, generally dissertation, work, even though they have done everything and have been working hard for years on a paper. When it comes to tenure reviews, it is all too often that these same people are being accused not to be independent thinkers – sometimes even not getting tenure for influential co-authored papers – even though they have been doing the research on their own all along and someone more established is getting the credit. In fact, I have even seen senior faculty refusing to put their name on papers by juniors they like a lot, precisely to avoid this type of expectation: this type of response by good people is telling.

I will share here an opinion, hopefully it is a positive drop of water on this issue. I have worked with many co-authors and have been very pleased with this, someone once said at the junior faculty consortium “co-authors make better papers” and I entirely agree. I have also seen various cases that did not end well. So, I’d like to share some experience on that. The reader may obtain a more formal discussion by leading scientists here, which largely parallels my points.

(Either of) conditions for a co-author. In my view, any of these things would make for a good, valuable co-authors, and this is hopefully the case for the majority of co-authorship.

  • Writing. Any contribution to writing a full section (such as either intro, main text, or extensive paragraphs on results) is part of a valuable contribution. Good writing is difficult and time-consuming, even when it is just a few difficult sentences to craft, and it requires deep understanding to transmit an argument.
  • Core model/technique. Writing the model (in a theoretical paper) or drafting a new estimation technique is also a core contribution to a paper – by technique or model, I mean more than stating an empirical regression but sufficient details on the design so that it becomes implementable.
  • Proofs/code. Some co-authors will do programming or difficult proofs, which is the unseen part of a paper but, nevertheless, where the true nature of the paper is. Anyone will say that this is perhaps the most important contribution of paper.

Aspects that are not expected. Let me know mention aspects that are not necessary conditions and are typically not true in most successful co-authoring partnerships (as far I could see).

  • are not equal contributors (even without ranking of authors). It is very rare that all co-authors would have spent the same amount of time or effort on a paper; first, skills vary; second, the tasks of each co-author may not be exactly identical; and, third, one co-author may initially be less involved and then catch up on some revision work.
  • are not technical experts. It is not because someone does not contribute to the technical parts of a study (like proofs or code) that this author may not be valuable. As to accounting, any paper is of no value unless the idea is stated in words, and doing this is different from crafting technique.
  • are not friends that help each other. We work with people because we have complementary skills or interests, as in any professional relationship. That someone co-authors with someone else does not pre-assume they have compatible personalities or would necessarily view what the other does with other people as “good work;” i.e., good people have a their own professional judgment independently of their co-authoring relationships.

Should you ever take a co-author out from a paper, say, if all of the either of conditions are unmet? I think the answer here is clear and unambiguous: you should never, ever, remove a co-author. Any attempt to do so will have serious consequences. The reason is simple, even not being interested about a paper (which is fine) is no reason to get as “punishment” some public awkwardness of being forced to step down on a paper. The person would and should respond to this. This cannot end well and will hurt anybody involved, including the research.

There is only one caveat to this, but only because it is, strictly speaking, not removing an author. This occurs if a paper is not working well at all, and does not seem publishable so that one of the authors has an interest in trying something related but different. This can only be done if the new paper does not supersede the prior paper, that is, the analysis of the original paper would still stand on its own. Thus, it is not removing an author but working on a different project on the same general issue. In a way, this simply means that one works with a co-author on one paper, not on an entire stream of research about one topic.

Even then, though, my advice would be to avoid any such outcome and try to keep co-authors as much as possible. This may seem unfair if you are doing all the work but, remember, you did choose to put someone’s name on a paper, and this is like a contract (albeit an incomplete one); one should not renege on a contract without consequences that far outweigh any remote benefits.

For more, see:

ICMJE statement on co-authorship

Don’t be (data) evil

A few months ago, one of my coauthors e-mailed this guy who had published a paper in JFE to ask him for their data. We meant to use this data for a different purpose. No response. I understand: let me explain. This is analysis that could, in principle, be redone from scratch (i.e., not from proprietary data) so the guy probably thought “these guys are lazy, so why bother: they should do their homework.”

Okay, I understand plausible reasons but, nevertheless, I would categorize this kind of attitude as evil. Replications don’t come for free: they are time-consuming in effort that could be spent doing something new, and can cause extra error since the follow-up research, whose focus is different, may not apply the same standards as the original. Plus, we all know that there are always many small choices that are not always reported and add up to alterations in the results.

Consider the following. In an NBER replication study of 67 papers published in major economics journals, only 1 paper out of 3 could be replicated replicated which was raised to slightly less than 1 out of 2 with ‘assistance’ from the authors.

We should all applaud the decision made by some journals including, but not exclusively, journals such as AER or JAR,  to require code and data. Information is a public good and I wonder how much will be achieved from relying on data charity (my experience has not been good there). However, I am still worried that two fundamental problems are still not being formally addressed by authors and editors.

  • Requiring a “credible source policy” for any empirical claim used in the journal

Even journal that have a data policy do not require their articles to use sources from journals or authors (possibly voluntarily – by public posting) to meet the same data policy. Think about it: (a) the data policy is present because there are doubts as to the credibility of any result that cannot be replicated and (b) a general policy of academic journal is not to include any claim that has not been rigorously established. Should (a) and (b) imply that no reference can be used that does not meet the data policy – or, if it used, that failure to meet the data policy should be noted and qualify the claim as ‘tentative’?

Compare this to theory. I know of no article that would reference a claim whose theoretical proof is unavailable even if, say, the proof is too large to fit in the margin, as more than a conjecture. Hard sciences are not protected from fraud, but all serious journals implement strict policies about sharing data. The lack of a credible source policy in the social sciences may, in the end, encourage less transparent journals to free-ride and hurt the very same journals implementing the policy.

  • Requiring peer-review over the supplementary documents

Can rules work without enforcement? Yet, I noticed that, even in journals with a data policy, the supplementary documents do not seem to have been refereed with the same standards as the original article, if refereed at all.

Most of the code that is shared is not commented in a way that allows the user to know what variables are and what they do, and for the most part, appears to be the incomplete notes of one co-author. On occasion, the code builds on a data set that comes from a different paper (and whose code is not available), completely defeating the purpose of the replication. There is very little guidance as to which files should be executed and in which order, as well as how this relates to each result in the paper. In other words, this code has simply not been designed to be re-used by a third party. 

I’ll point here to J. Shapiro and M. Gentzkow’s excellent advice. They say it all but I’d like to pick up on one point that connects to my experience as a theory person. Writing a code should be like writing a proof: clear, elegant and as concise as possible. Like for a proof, one would reprise a correct argument to make it cleaner and more direct, and remove the unnecessary repetitions. The data portion and code should be refereed, and the referee should be able to run the code, understand what it does and suggest improvements.