Lizka

Content Specialist @ Centre for Effective Altruism
12426 karmaJoined Nov 2019Working (0-5 years)

Bio

I run the non-engineering side of the EA Forum (this platform), run the EA Newsletter, and work on some other content-related tasks at CEA. Please feel free to reach out! You can email me. [More about my job.]

Some of my favorite of my own posts:

I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I've since switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.

Some links I think people should see more frequently:

Sequences
5

Classic posts (from the Forum Digest)
Forum updates and new features
Winners of the Creative Writing Contest
Winners of the First Decade Review
How to use the Forum

Comments
413

Topic Contributions
233

Lizka
3d64

Thanks for sharing this! I might try to write a longer comment later, but for now just a quick note that I'm curating this post. I should note that I haven't followed any of the links yet.

Lizka
7d60

Thanks for sharing this, I appreciate it! I'm really excited about the study. 

I haven't read the full study yet, but I came across a Twitter thread by one of the authors, and I thought it was helpful: https://twitter.com/aaronrichterman/status/1663957463291265032?s=46&t=A7sa4lqau2E-U-pxX7DJWQ

Key points from the thread (on top of what you summarized in the post)

Table with results

We used difference-in-difference models[1] to show these programs led to a 20% reduction in mortality for women, and an 8% reduction in risk of death for children under 5

Image

Mortality reductions in different groups over time

Mortality reductions began within 2 years of program introduction and generally got larger over time

Image

(Can someone make a more easily parsable version of this graphic? The data from the study is publicly available..)

  1. ^

    I didn't know this term, but it's the method that I was imagining. From Wikipedia: [This method] calculates the effect of a treatment [...] on an outcome [...] by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group.

Lizka
9d148

I appreciate this post, thanks for sharing it! I'm curating it. I should flag that I haven't properly dug into it (curating after a quick read), and don't have any expertise in this. 

Another flag is that I would love to see more links for parts of this post like the following: "In an especially egregious example, one of the largest HVAC companies in the state had its Manual J submission admin go on vacation. The temporary replacement forgot to rename files and submitted applications named for their installed capacity (1 ton, 2 ton, 3 ton, etc.), revealing that the company had submitted copies of the same handful of designs for thousands of homes." I don't fully understand what happened or why (the company was cutting corners and was pretending the designs were customized when they were actually not?), and a link would help me learn more (and see if I agree with your use of this example!). (Same with the example about building managers in schools in the early days of COVID.)

I'm really grateful that you've shared this; I think the topic is relevant, and I'd be excited to see more experts sharing their experiences and what various proposals might be missing. I particularly appreciated that you shared a bit about your background, that you used a lot of examples, and that the "Complexity and opacity strongly predict failure" heuristic is clear and makes sense. I think further work here would be great! I'd be particularly excited about something like a lit review on many of the interventions you listed as promising (which would also help collect more readings), and estimates for their potential costs and impacts. 

Minor suggestion/question: would you mind if I made the headings in your post into actual headings? Then they'd show up in the table of contents on the side, and we could link to them. (E.g.)

Lizka
10d20

No need to apologize, and thanks for making the topic page, Will! I batch-approve and remove new Wiki entries sometimes (and reorganize the Wiki more generally), but I'm not prioritizing this right now. I do hope that we'll get more attention on the Wiki soon, though (in the next couple of months). I've added a note to the Wiki FAQ — thanks for that suggestion! 

Lizka
10d112

Note that this was covered in the New York Times (paywalled) by Kevin Roose. I found it interesting to skim the comments. (Thanks for working on this, and sharing!) 

Lizka
14d20

Following up on this: we've expanded the Community section on the Frontpage to show 5 posts instead of 3. Nothing else should have changed with this section right now. 

Lizka
14d20

Re Library page: I agree with and appreciate this suggestion. I'd be excited for that to be a list you can sort in different ways. I think it's on the list of things to prioritize, but I'll make sure. 

Re top right search bar: I think they do, but they're at the bottom of the results, and in some cases that might get cut off. But you can also use the full search page for this, e.g.: https://forum.effectivealtruism.org/search?contentType=Sequences&query=classic%20posts%20from%20the%20&page=1 

Lizka
15d20

I also find the following chart interesting (although I think none of this is significant) — particularly the fact that pausing training of dangerous models and security standards have more agreement from people who aren't in AGI labs, and (at a glance):

  • inter-lab scrutiny
  • avoid capabilities jumps
  • board risk committee
  • researcher model access
  • publish internal risk assessment results
  • background checks

got more agreement from people who are in labs (in general, apparently "experts from AGI labs had higher average agreement with statements than respondents from academia or civil society").

Note that 43.9% of respondents (22 people?) are from AGI labs. 

Lizka
15d40

I'm surprised at how much agreement there is about the top ideas. The following ideas all got >70% "Strongly agree" and at most "3%" "strong disagree" (note that not everyone answered each question, although most of these 14 did have all 51 responses):

  1. Pre-deployment risk assessment. AGI labs should take extensive measures to identify, analyze, and evaluate risks from powerful models before deploying them.
  2. Dangerous capability evaluations. AGI labs should run evaluations to assess their models’ dangerous capabilities (e.g. misuse potential, ability to manipulate, and power-seeking behavior).
  3. Third-party model audits. AGI labs should commission third-party model audits before deploying powerful models.
  4. Safety restrictions. AGI labs should establish appropriate safety restrictions for powerful models after deployment (e.g. restrictions on who can use the model, how they can use the model, and whether the model can access the internet).
  5. Red teaming. AGI labs should commission external red teams before deploying powerful models.
  6. Monitor systems and their uses. AGI labs should closely monitor deployed systems, including how they are used and what impact they have on society.
  7. Alignment techniques. AGI labs should implement state-of-the-art safety and alignment techniques.
  8. Security incident response plan. AGI labs should have a plan for how they respond to security incidents (e.g. cyberattacks).
  9. Post-deployment evaluations. AGI labs should continually evaluate models for dangerous capabilities after deployment, taking into account new information about the model’s capabilities and how it is being used.
  10. Report safety incidents. AGI labs should report accidents and near misses to appropriate state actors and other AGI labs (e.g. via an AI incident database).
  11. Safety vs capabilities. A significant fraction of employees of AGI labs should work on enhancing model safety and alignment rather than capabilities.
  12. Internal review before publication. Before publishing research, AGI labs should conduct an internal review to assess potential harms.
  13. Pre-training risk assessment. AGI labs should conduct a risk assessment before training powerful models.
  14. Emergency response plan. AGI labs should have and practice implementing an emergency response plan. This might include switching off systems, overriding their outputs, or restricting access.

The ideas that had the most disagreement seem to be: 

  • 49 (48 in the graphic above) - Avoid capabilities jumps. AGI labs should not deploy models that are much more capable than any existing models.
    • 11% somewhat disagree, 5% strong disagree, and only 22% strong agree, 35% somewhat agree, 37 responses
  • 48 (49 in the graphic above) - Inter-lab scrutiny. AGI labs should allow researchers from other labs to scrutinize powerful models before deployment.
    • 13% somewhat disagree, 3% strong disagree, 41% somewhat agree, 18% strong agree, 37 responses
  • 37 - No [unsafe] open-sourcing. AGI labs should not open-source powerful models, unless they can demonstrate that it is sufficiently safe to do so.
    • 14% (somewhat and strong) disagree, 57% strong-agree, and 27% somewhat-agree, 51 responses
  • 42 - Treat updates similarly to new models. AGI labs should treat significant updates to a deployed model (e.g. additional fine-tuning) similarly to its initial development and deployment. In particular, they should repeat the pre-deployment risk assessment.
    • 14% somewhat disagree, but 45% strong agree and 35% somewhat agree, 51 responses
  • 50 - Notify other labs. AGI labs should notify other labs before deploying powerful models.
    • 11% somewhat disagree, 3% strong disagree, 11% strong agree, 32% somewhat disagree, 38 responses

And

  • 21 - Security standards. AGI labs should comply with information security standards (e.g. ISO/IEC 27001 or NIST Cybersecurity Framework). These standards need to be tailored to an AGI context.
    • Got the most strong disagreement: 6% (51 responses), although it's overall still popular (61% strong-agree, 18% somewhat agree)

 

 (Ideas copied from here — thanks!)

Lizka
15d40

In case someone finds it interesting, Jonas Schuett (one of the authors) shared a thread about this: https://twitter.com/jonasschuett/status/1658025266541654019?s=20 

He says that the thread is to discuss the survey's:

  • Purpose 
  • Main findings 
    • "Our main finding is that there is already broad support for many practices!"
    • "Interestingly, respondents from AGI labs had significantly higher mean agreement with statements than respondents from academia and civil society."
  • Sample (response rate 55.4%, 51 responses)
  • Questions 
  • Limitations 
    • E.g.: "We likely missed leading experts in our sampling frame that should have been surveyed." and "The abstract framing of the statements probably led to higher levels of agreement."
  • Policy implications

 

Also, there's a nice graphic from the paper in the thread: 

Image
Load more