End of Project Write Up: Effective Altruism Policy Analytics

Gentzel

If this is your first time reading about Effective Altruism Policy Analytics, you can look at our website here for more information: https://eapolicy.wordpress.com/

Effective Altruism Policy Analytics finished its last policy comment for the summer on Saturday, September 12th, 2015. The project ran longer than its initial funding period due to Effective Altruism Global events and one of the team member’s need to take a job and work part time to guarantee future employment. In this write up, we’ll be presenting a summary of our project, what we learned, mistakes we made, and a narrative of the project describing our work on policy comments.

Summary:

Over the course of 15 weeks, Effective Altruism Policy Analytics produced policy comments on the United States Federal Register with the intent of improving regulatory action and collecting data on how to affect regulatory action. Since our last update, we continued making changes in the same direction while also improving the formating of our comments. We also devoted some attention into regulatory reviews, which offer the potential to be more proactive in targeting bad policies for improvement while having a slightly lower probability of good feedback. Due to this project’s speculative nature, predicting impact is difficult, however on expectation the project was worth it.

Even assuming that most of our comments did nothing, the comment that we think is most likely to succeed involves roughly doubling the cost effectiveness of a $7.7 million program via changing building requirements and improving prioritization. As far as we could tell, our comment had no competition and was modeled after a successful change to international building codes that was accomplished via comment. If only this comment is successful, we’d be generating over $2,500 in housing benefits per hour worked for some of the poorest Native Americans. If in the coming year, we receive feedback that we successfully influenced such a policy, the project is almost certainly likely to be worth the time invested, without the need to consider experimental value. Furthermore, many of the comments we worked on involved much larger possible impacts, though there was much more uncertainty about our chance of influence.

Nevertheless, the primary value of this project is likely to be experimental value. With failed comments, agency responses will give us a window into what mistakes we made and other possible ways to influence policy. Successful comments may also help generate ideas for where effective altruists should focus their efforts to influence policy.

Overall, we think it is likely that this or a similar project will be worth funding in the future, though not in the same manner. We would benefit greatly from having people more experienced in economics, data science, and public policy working full-time on the project. Other than this, good changes are hard to predict and going forward should be informed by the policy feedback we receive over the coming year.

General Lessons Learned:

The comments that we think are most likely to cause changes were targeted at small programs with many obvious technical problems. For example, a housing improvement program run by the Department of Interior, which used proxies for building code in areas with no building code, has requirements which are clearly worse than current building codes. We proposed a simpler and less stringent set of requirements that aligns better with current building codes, and would reduce building cost, allowing more people to obtain housing through the program. We believe agencies could avoid oversights such as this simply by consulting economists and engineers.

The more expensive a policy is, the more attention it will attract, and the more research and opposition research will have already been done. This caused our comments on larger regulations to be more uncertain, presenting information rather than always making a strong argument for change in a particular direction. In some cases, we just tried to simplify the way policies and equations were designed so that they would be easier to change and understand by regulators in the future. In a Department of Energy comment, for instance, we took ten lines of text describing an equation for program funding and re-wrote them in equation format. We highlighted that the original equation may not have been in-line with the regulation’s intentions and that it may be possible to use a simpler equation that aligns more closely with the agency’s actual goals.

U.S. Government agencies vary a lot. It can be hard to predict where there are opportunities for improvement and to difficult understand the effects a policy may have. Specialization is often important in order to provide advice that exceeds the knowledge an agency already has.

If you submit a factually incorrect comment, you are less likely to have influence than if you submit a factual one, even if an incorrect comment is written convincingly. This is why we believe policy comments based on Fermi estimates can still have value, because they draw attention to the big picture. In the absence of the commenter having accurate info, regulators can fill in the blanks to make a better policy.

In terms of time, it is very expensive to read through the federal register, especially without domain-specific background knowledge. Much of our time working on the project was spent searching for policies to comment on. At first glance, it can be very difficult to detect if a proposed rule is worthy of investigation, and it can take a very long time to fully understand a policy. In some cases, we actually began drafting a policy comment before we realized that we had misunderstood the proposed rule. Several proposed rules had mistakes and ambiguities that prevented us from properly understanding their contents. A key lesson we learned from this was to review possible mistakes to understand how they were made. This could mean checking for typos by consulting other areas where the government agency had posted its proposed rule and making sure the posts are identical. Prevention could take the form of looking through older agency databases, looking for mechanisms that counter the an apparent policy problem , and reading the regulatory history surrounding an issue.

Economists like Richard Bruns are very good at working around hard problems quickly and finding decent proxy/estimated data when none is available for cost benefit analysis. Many of the skills known by economists can be used to quickly estimate the positive and negative effects of a policy. This is contributing factor to why policy comments during the semester were faster than ones over the summer: we had full time access to Richard on the weekends that we did policy comments. Likewise, familiarity with data science and modeling can also be valuable for generating high quality comments rapidly as we noticed from our interactions with David Roodman.

Correspondence can be very attention-draining. Communication time with outside groups and individuals who didn’t respond pulled attention away from our direct work and many experts that did respond brought us no new useful information. Of the many people who wanted to be occasional volunteers at one point or another, only about 5 were consistently productive to work with. Until we made an email form, most requests for feedback were not worth the time we put into them. Countering some of these problems, also we developed a good habit of assigning small well defined tasks to occasional volunteers, and asked expert questions that were more likely to nudge them into giving us information we didn’t already have. It is not easy to increase in scale, without focusing only on increasing scale and managing volunteers. Over time we concluded that without high-level guidance like we received from Richard, new versions of the project at other universities would have difficulty operating independently. Until that type of infrastructure can be established, we think it would be best to decrease prioritization for scaling the project up.

It is very difficult to develop good performance metrics for producing policy comments. Feedback is slow, and because of that, it is difficult to assess the probability of success. In addition, comments are reactive rather than proactive, so we were not able to propose new ideas unless they were relevant to content on the federal register. However, the benefits of influencing regulations in the federal register are potentially massive. A single comment could influence millions of dollars and save many Quality-Adjusted-Life-Years. While metrics like dollars and QALYs can be used to estimate how effective a given comment could be, they don’t tell us much about our personal performance, as we don’t yet know the likelihood that our suggestions will be adopted. We tried to use feedback from experts and people working in policy to improve our comments over time, but that isn’t an objective metric. As we receive feedback from government agencies, we will gain insight about the success rate of our policy comments, and become more able to make concrete predictions about the expected value of given future comments. Direct feedback should be returning within 6 to 18 months.

Time management for this type of project is difficult. Forming arguments and equations before completing citations, formatting comments, or following up on communication can greatly speed up the production of comments, but often leads to last minute surprises and large time consuming changes. We would first conduct a Fermi-Estimate, then replace estimated numbers with better data, to prevent the need to overhaul a comment from the ground up based on new information. However, rigorously verifying specific facts before trying to use them to fully form arguments is a risky time investment, since the facts that may initially seem the most relevant may not be relevant in the finished product. Better division of labor and communication norms improved our performance over time.

Small sample size paired with the complexity of regulations will make it difficult to make strong inferences of causation from just the outcomes of regulations. We will be relying on responses of regulators to verify that we actually caused any impact. Because regulators must respond by argument, there are several cases where we should be able to find out if the regulators made changes specifically due to our arguments. In the first response we received, our group was addressed by name.

Significant Mistakes

Comments were not optimally formatted at first (numbered, responding by question, etc). Formal correspondence in the federal government is fairly redundant, so including sections like executive summaries is counter-intuitive when our goal is to pitch an idea succinctly. It also greatly increases the workload per comment (resulting in less comments). Though our formatting improved over time, we still think future attempts at comments should pay more attention to format, and that a lot of attention should be given to the problem of how to quickly change format ahead of time. Rigorously checking every format change for errors results in a lot of wasted time, but failing to do so can lead to large problems.
We had an equation for break-even analysis assuming that many people had a rational risk-informed preference to still ride motorcycles without a helmet or with an unsafe helmet. In our submission, a ratio was used which was not intended, that made new helmet regulation fare worse on break even analysis than our estimate would imply. This is likely inconsequential, because despite this mistake, proposed changes still passed break even analysis.

We had over ten cancelled comments, which was a significant drain on time. Fortunately, many of these comments did not have a large amount of input before we made the decision to cancel them. Rather, we had marked them for consideration and performed introductory research before determining that the comment was not worthwhile and further time investment would be a waste.

We had inconsistent values for the statistical value of life due to finding better sources over time, while also finding out that some of the currently used agency VSLs are not public. The method we settled with was to use the relevant government agency’s most recent publicly published VSL and adjust based on time.

Responses

Regulatory responses can take quite a lot of time. While we searched for comments to make on regulations with guaranteed feedback deadlines (usually final rules), most of the opportunities we found for this involved very specific issues which we were unable to gain the relevant domain knowledge on (railroad taxes, compliance cost for a specific set of businesses, etc.) within time constraints. Given that we were uncertain we could make correct comments on such issues, we opted for commenting on proposed rules and advanced notices of proposed rules more frequently.

Based on estimates from Richard and contacts in think tanks, we estimate it will take between 6 and 18 months for responses to come back. Accordingly, our only response received so far is from one of our experimental comments before EAPA started.

EPA response to SNAP comment:

https://eapolicy.wordpress.com/2015/07/09/protection-of-stratospheric-ozone-change-of-listing-status-for-certain-substitutes-under-the-significant-new-alternatives-policy-program-response/

Work Credit

Full-Time:
Matthew Gentzel
Emma Atlas

Interns:
Lawrence Roth
Miles “Milo” King
Vincent Wu

Significant Volunteers:

Richard Bruns
Matthew Dahlhausen
Landry Horimbere
Rachael Kubicek

Eileen Martz

Donors:

Jaan Tallinn

Matt Wage

Ben Kuhn

If you'd like to learn more, comment below or read our in-depth project narrative here which goes through what we worked on and commented on during our project.

Peter WildefordNov 6 20151

We would benefit greatly from having people more experienced in economics, data science, and public policy working full-time on the project.

What data science needs do you have?

jonathanstrayOct 31 20151

Really glad you did this. I see some similarities with my work as a journalist. I've previously argued that journalism has never attempted systematic evaluation of government, e.g. department by department, so it's fantastic to see someone attempt this. Your problems regarding domain knowledge, slow or unhelpful responses from officials, inconsistent transparency, etc. are spot on and well known to reporters. Keep up the good work!

Tom_AshOct 28 20151

Thanks for this detailed update! What's your plan for the future, depending on various outcomes?

GentzelOct 29 20152

This is my current heuristic, though if we learn unexpected things from feedback I could imagine updating in a different direction:

If positive feedback (successful comment) --> Try to restart project

If really good negative feedback --> Make a better lessons learned post and propose a different type of project

If ambiguous negative feedback --> Recommend people avoid experimenting with this type of policy action and focus on other policy interventions.

[anonymous]Nov 19 20150

Matthew, this is awesome! I read the project narrative, and I'm looking forward to hearing more about this when you get more replies.

One issue that came to mind. I do not know how you came to the 2 500/hr estimate for the building code comment. Is this something like the equivalent to making the federal housing agency $2500 more effective? If so, you still face the problem of estimating how much well-being is created with that $2500. If another use of your time could generate, say $25/hr of value to AMF, perhaps that would still beat the 2500/hr to the US housing agency.

Having said that this project was almost surely worth your time for exploration value alone.

Effective Altruism Forum
EA Forum

End of Project Write Up: Effective Altruism Policy Analytics

19

19

Reactions

More posts like this