Hide table of contents

Note: I used LLMs to draft different parts of this. I've checked almost everything, but there might be some mistakes remaining.

Apologies for posting this on Christmas Eve. I wanted to get this out the door before the end of the year. Questions welcome, and if it's easy to pull metrics to answer them, I will.

Summary

80,000 Hours launched a video program in 2025 focused on longform, cinematic, personality-driven content about AI risks. Our first two longform releases were:

Both videos significantly outperformed our expectations (we'd anticipated 15-50K views for the first). The cost per engagement hour ($0.11 and $0.39 respectively, including staff time) compares favorably to other 80,000 Hours programs.

This post covers: what we spent, what we got, why we think it worked, and what we'd do differently.

The numbers

Costs

CategoryAI 2027MechaHitler
Direct costs~$50K~$64K
Staff hours~450 hrs~450 hrs (Note, I’m assuming it’s about the same as for AI 2027, I didn't re-ask people how much time they spent.)
Total cost (making some assumptions about we should incorporate staff time)~$160K~$174K

The MechaHitler video cost more due to Bay Area crew and studio rates, travel costs, and more spending on props. I’m overall fine with this (I was worried the gap would be much larger because the Bay Area is so expensive), though I would like to get prop costs down a bit since some of it I think was pretty unnecessary. If we avoid reshoots, we also don’t need to rent a second studio.

Timing

AI 2027 took about 9 weeks, MechaHitler about 11 weeks to produce

  • Ideation (minimal for AI 2027, 1-3 weeks for MechaHitler)
  • Outline and scripting (3-4 weeks)
  • Shoot (3 days)
  • Editing: 4.5 week from end of shooting to launch

Results

Note: Some of these results will be an artifact of AI 2027 having been out for three months longer, But I think it's clear that both of them have views that have very much tailed off.

MetricAI 2027MechaHitlerExtra context from Claude + commentary from me
Views (as of Dec 2024)8.9M2.7M 
Watch hours1.4M419K 
Likes363,413 (~4% of views)

 
156,203 (~5.7% of views)

Claude: 2-4% is typical/healthy

4-6% is strong engagement

Above 6% is excellent (though can also indicate smaller, dedicated audience)

New subscribers~197K (0.021 sub/view ratio)

 
~50K (0.017 sub / view ratio)

Claude: 0.5-2% is typical (so 1 new sub per 50-200 views)
 

2-4% is strong, especially for discoverable content reaching new audiences

Comments35K (0.4% of views)
 
14.5K (0.5% of views)

Claude: 0.5-2% is typical for most content

2-5% suggests high engagement (controversial topics, strong CTAs, community feel)


Commentary: I'm really surprised we're on the low end for this since I just really feel like AI 2027 seemed to have a huge number of comments (This may have been more true closer to launch and less true now)

Avg. view duration (% of video)29%35%

Claude: For long-form content: 40-50% average percentage viewed is solid

  • 50-60% is strong
  • Above 60% is excellent

Commentary: So we may not be doing awesomely here, though this seems way too high to me. Our numbers are similar to the problem profile video numbers (23.3% for AI, 34.8% for biosecurity)

Cost per engagement hour$0.11$0.39 

We are overall really happy that we were able to make another video with >1M views; makes it clear that AI 2027 was not just a flash in the pan. At the same time, AI 2027 (blue line) is clearly very special; throughout the post-MechaHitler launch, AI 2027 had had more views on most days. (MechaHitler is the purple line)

How valuable is a video watch hour?

This is genuinely uncertain. Compared to someone reading the 80,000 Hours website:

  • Video viewers are much less selected for being in our target audience
  • They're more counterfactually reached (wouldn't have found us otherwise)
  • They're likely less deeply engaged (passive watching vs. active reading)

Reasonable estimates range from 1/3 to 1/300 as valuable as a web reading hour. Even at 1/300, the cost per quality-adjusted hour is still on the lower end of the range of other programs 80k considers worthwhile.

Qualitative Feedback

AI 2027

  • Jacob Collier’s (famous musician) and Ali Abdaal’s (one of the more YouTubers with 6.5M subscribers) channels are now subscribed to ours (Ali Abdaal is someone we know, so less surprising)
  • Some great feedback from exciting people on AI 2027
    • Got good feedback from Yoshua Bengio and John Leaver
    • “You’re really good on camera, you killed it” - from a previous primary producer on Veritasium
    • Other people I won’t name here but it was really exciting that they liked it!
  • Two people in the Constellation slack said the AI 2027 video was what they’d recommend for a friend to read/watch one thing about AI importance / risk
    • Someone interviewing important people said AI 2027 is the one thing that cut through
  • The YouTube response on AI 2027 is very very positive (97.3% likes vs dislikes)
  • Generally lots of positive comments, especially about production and Aric e.g.
    • “Amazing video. Lighting is insane on that earth shot on the table. Leagues beyond everyone else in production value. Subbed”
    • “Absolutely loved this! One of the best videos on YouTube that I have seen in recent years.”

MechaHitler

  • By far the craziest thing that happened for MechaHitler is that Hank Green really liked it. He made a community post about it on his channel and said “Watch this video and then watch it a second time so that YouTube knows to show it to every person on the internet.”
    • He hasn’t posted there about anyone else’s channel in the last year

YouTube commenters like:

  • The writing and research: "Peak journalism, ngl. Loved the content, research, accuracy and production of course."
  • The production value: "Amazing video. Lighting is insane on that earth shot on the table. Leagues beyond everyone else in production value."
  • Aric: Your gentle, well informed voice broke my heart in ways I didn't expect before clicking on."

People like to guess that AI 2027 / Aric are AI generated because they’re too perfect

What the comments don’t like:

AI 2027MechaHitler
US vs. China framingToo soft on Musk ("Being very generous to Musk here - possibly for legal reasons?")
Too speculativeToo hard on Musk / made it "political" / "woke"
Fear-mongering / alarmismFear-mongering / alarmism
Too credulous on AI capabilities (LLMs are just "prediction tools" and can't reach superintelligence)

Gave too much benefit of the doubt (disagreed that "no one had the intention" to make Grok problematic)

 

Matt Reardon also wrote a critique of AI 2027 here, mostly focused on it not making an argument. (He feels we did better on MechaHitler).

Qualitative Analysis

Why we think AI 2027 did well

  1. Working with Phoebe Brooks, our project manager, producer and editor, who made the video look really professional
  2. Topic: AI 2027 seems to do well by default, this is probably a major factor, see: AI2027: Is this how AI might destroy humanity? - BBC World Service, which is an outlier success for the BBC, and YouTuber Mithuna made a video that was 8.1x as successful as her average: AI Made a Movie About Its Own Future
  3. It’s a story, so people are compelled to keep watching, and people can sit back and listen
  4. It’s “cinematic” - Phoebe comes from the film world, Aric is a previous actor
    1. This made it stand out in a big way from the rest of YouTube
    2. This was surprising - we weren’t exactly aiming for “cinematic”. We thought we would aim more for a Vox explainer style, so kind of a lucky accident
  5. High production value
  6. A sense of getting in early on an up and coming channel that feels like it's going to be a big deal.
    1. Good graphics and editing
  7. Strong hook
    1. Someone who works on another big channel said “I actually watched the video yesterday, it's incredible. I showed the opening to my colleague because I thought it was genius”
  8. Emotional weight at the end
  9. Aric is very charming and a good presenter
    1. I think it’s been really valuable that Aric has such a deep understanding of the topics, and has genuinely formed his own takes. The nuance, understanding and earnestness comes through.
  10. Credibility signals - citing serious researchers
  11. 80,000 Hours Newsletter bump (sent to 500,000 people)
    1. (Note that the click-through rate went up over time, which indicates that the people that YouTube found to watch it liked it even more than the first set of people, many of whom came from the newsletter.)
  12. Continued iteration on thumbnails and titles using YouTube’s A/B testing feature (this was crucial)
  13. Community posts and videos from other creators pointing people to us
    1. youtube.com/post/UgkxYgZB2pvvAhBUJ751uRhtp2sURVpOaEDT?si=bkcaLsWsvSYbapDU
    2. https://www.youtube.com/post/UgkxbAn2C9RIN4h984GYosVzKfwn_nd_zYhG
    3. I Tricked ChatGPT Into Committing Crimes
  14. Tweeting about it
    1. One of my tweets about it got 34k views (the tweet, not the video)

Why MechaHitler did less well (but still well)

We're genuinely uncertain, but some guesses:

  • Less shareable: AI 2027 got shared in group chats and on social media more (4% external traffic vs. 1.1% for MechaHitler)
  • Less emotional climax: The story structure "trails off" somewhat in the second half
  • Less novel: The second video from a new channel is inherently less exciting than the first

That said, MechaHitler had higher average view duration (35% vs. 29%) and higher click-through rates. We also think we were able to hit the nuance well on a really tricky topic.  The comments seemed balanced between thinking we were too hard on Musk and not hard enough, and very few people seemed angry at us despite covering a politically charged topic.

Lessons Learned

Overall what we think matters

  • Topic choice
  • Cinematic visuals
  • Extremely high production value
  • Personality-driven content
  • Thumbnails and titles
  • Strong hook
  • Aric very charming, good presenter, coming off as earnest and neutral
    • Aric seems very knowledgeable and well-researched
  • Story-based videos
  • Emotional weight and heft
  • Credibility signals
  • Share impulse

Our guess at what’s less important (though we’re certainly unsure, maybe if we nailed these, we’d get more success)

  • A completely satisfying arc
  • Good links from other people (the Hank Green callout didn’t help MechaHitler’s views noticeably as far as I can tell, though it’s a little hard to tell. My guess is it did help subscriptions, given how many people remarked that they subscribed because of him, though there’s no obvious bump in the graph)
    • Though this kind of thing could have a bunch of diffuse impacts.
  • Highly produced interviews
  • Upload frequency/consistency
  • Watch time - it seems like ours is e.g. lower than the 80k channel’s
  • Subscriber count
  • Aric’s face in thumbnail
  • Title curiosity gap - it’s one way to approach titles but not the only way
    • Grabby words like “dark reign” don’t necessarily do better

How our production works

Each video follows a roughly 9-11 week production cycle, though we’ve only done two, so of course it might change. For our third video we’ve experimented with a much longer writing time. Here's what that looks like in practice.

The timeline

Note, these overlap:

  • Weeks 1-2: Rest. The team recovers from the previous launch.
  • Weeks 2-3: Ideation. We generate and evaluate topic ideas.
  • Weeks 4-7: Scripting. The biggest phase, and our biggest bottleneck from making more videos.
  • Week 8: Shooting. Typically 2-3 days of principal photography.
  • Weeks 9-13: Interviews, reshoots, editing and launch. Multiple revision cycles, some reshooting, filming interviews, prepping for launch, then release.

Ideation

We maintain a running document with dozens of potential topics (as well as ideas, interesting nuggets, good papers, etc that we’d like excuses to talk about). We look for stories we can attach important technical points to, rather than pure explainers.

During ideation weeks, we ask people in our network for ideas. We flesh out our top ideas with potential titles, hooks, main beats, interesting "nuggets" we could include, and arguments for and against doing those topics, and then pick.

Scripting

This is consistently our biggest bottleneck on getting more videos out the door. Good writing is hard and it takes a long time. A good script for us needs a good backing research, strong hook, good structure, a storyline and interesting nuggets and technical elements as sidebars or weaved throughout. I think it’s been really valuable that Aric has such a deep understanding of the topics, and has genuinely formed his own takes.

We're still iterating on our scriptwriting process. Both videos had us working until and through the last minute on the script, which added some stress and the need for reshoots and voiceovers. We're actively hiring writers and experimenting with writing structures to address this.

We get feedback from others at 80k, external volunteers, and people we ask to be our technical advisors.

Shooting

We rent studio space (usually on Peerspace) and hire a professional crew: director of photography, gaffer, sound operator, and assistants. For MechaHitler, this meant renting a large industrial studio in the Bay Area and sourcing props from Amazon and a prop house. The cinematic look comes from lighting, set design, and shooting on professional cinema cameras.

Interviews happen separately, often via video call. We recommend Riverside, not zoom.

Reshoots / Voiceover

Sometimes we notice in the editing that something is missing, so we’ll do voiceover or reshoots to land a technical point, emotional climax, hook or ending.

Editing

Editing involves many revision cycles. A typical round might involve 200+ comments ranging from "this cut feels fast" to "should we reshoot this section?" We maintain multiple versions simultaneously: one for internal feedback, one for graphics work, one for external review.

Launch

Launch involves

  • Generating and preparing thumbnails and titles. We do most of our thumbnails in house (thank you Nik Mastroddi! She’s spectacular) but have experimented with working with folks on e.g. Upwork as well.
  • Getting necessary sign-offs from e.g. from technical advisors
  • Preparing social media posts
  • Writing a newsletter
  • Creating the landing page
  • Preparing cutdowns for different platform
  • Coordinating all of these elements across teams
  • Launch
  • Iterating heavily on titles and thumbnails in the days and weeks afterwards

We've learned to aim for mid-week launches (ideally Wednesday) to leave time for thumbnail and title iteration. Coordinating across teams for launch is something we're actively working to improve.

What we’re still figuring out

  • Impact measurement: We track views and watch hours, but we don't have good data on who's watching or whether it changes behavior. We're not sure how to value a YouTube watch hour compared to other interventions.
  • Scriptwriting process: This remains our biggest bottleneck. We're actively hiring writers and experimenting with different structures.
  • CTAs: We want viewers to do something with what they've learned, but we aren’t sure what.

Closing thoughts

It's extremely exciting to be working on video at a time when so many people are getting excited about this. I'm extremely grateful to people like Petr Lebedev, John Leaver, Drew Spartz, and Liam from Siliconversations (and many more) for the amount of support and advice they've given us as we start this.

I also just want to give tremendous appreciation to the video team, our contractors and others at 80k who have invested so much in making this program a success: Aric Floyd, Phoebe Brooks, Daniel Recinto, Sam Watkins, Bella Forristal and many others.

Two videos in, we’re excited about the house style and approach we've landed on and want to keep making videos like this. We're also excited about thinking about opportunities for growth, other venues, other channels, etc.

If you're interested in following along, subscribe to AI in Context. If you're interested in working with us (especially as a scriptwriter), let me know!

143

2
0
24
1
1

Reactions

2
0
24
1
1

More posts like this

Comments6
Sorted by Click to highlight new comments since:

Hey Chana, 

This post is very useful among other reasons to see how thoughtful the work came about, what went into it, why you think it worked, and what metrics and data you look at.

Thanks for getting it out the door when you did. It meant I actually had time to read it.

At ACE we have this Better for Animals resource that reviews evidence for various interventions. It also covers documentaries briefly. Pro-animal docs' lessons might not be particularly applicable to AI safety ones, but maybe it gets you thinking about those CTAs.

Because that's the main question I have after reading this post, what are you evaluating for? 

It is very impressive what you all did and I envy/admire the results. You made a beautiful product that people respect and many, many people consumed. But that does not equal changing the path we're on. Or maybe it does if awareness is the thing that you think shifts our future.

I'm not suggesting you didn't achieve your purpose, just that it's not crystal clear from this retrospective what that was and how you measure getting there. Maybe in the next evaluation report, you can share the purpose and outcomes you set for the video project, or the theory of change.

If it's primarily awareness that you're going for than the currently listed data points seem great. If you want public discussion, than optimizing for those comments might be the thing to try and the CTA can be something around asking people to leave in the comments (or when they share the video) about what worries/questions they have. If you need people to talk to their representatives, then your CTA and content maybe need another slight adjustment to prime them for that. (IDK if public discussion or political activism are useful steps on your path to impact. Again, I know embarrassingly little about this cause area.)

I admit that I'm curious in large part because I hope to learn from it for my own work. There are so many animal welfare documentaries and quite a few voices (and some funders) saying we need more of them based on their personal experience or intuition. But we don't have clear evidence that that these are a cost-effective method to help animals.

Now, the goals and context for AI safety are very different from those for animal welfare so it might work and work differently for you. But if I better understand what you're trying to do, then I can more easily see what the animal advocacy community should (not) copy.

Separate but somewhat related: how do you think about the use of videos as a path to change against different AI timelines? Do short timelines mean that time-consuming, high-production value video are less or more useful than some other alternative, e.g. high volume of low-cost, quick posts at strategic spaces. This is not my area of expertise so I'm honestly just wondering. (Again partially so I can learn from it to help animals better.)

Anyway, here's that resource and thanks again for sharing these insights:

https://docs.google.com/document/d/1O0ylEEQJMQMTBlHDHcNZwvTgifi7TNyd6GpabC_4VT0/edit?usp=drivesdk

One more caveat: some of the studies we looked at might be based on a time when media was consumed very differently.

Basically you're right about all of the above, and we don't yet have an end metric we've figured out, so we don't know how much impact we're having! It's the biggest open question for our team right now, and I hope to get clarity on it in the next quarter or two and share that. 

Re: Timelines - we don't have super strong views on timelines, but the amount of weight I put on short timelines do I think mean that any longer leads than an average of 3 months per video seem bad to me. Whether we should try to go for e.g. 1/week I'm not yet sure on.

Thanks so much for your comment!

Many creators act as though Youtube's algorithm disfavors content that refers to graphic acts of sex and violence, i.e., bleeping words like 'kill' or 'suicide' or referring to these in very circuitous ways. I would guess these are incomplete methods of avoidance and that YT tries to keep up by detecting these workarounds. Seems like a potential issue for the MechaHitler video. 

Yeah, we have mixed evidence on this - we didn't get on the hype leaderboard despite having enough votes, which suggests disfavoring, but we also just did get a lot of views and no weird ratios of e.g. views/comments so I don't personally think there was a lot going on there. 

Regarding the average view duration %: I think it makes sense for longer videos to have somewhat lower percentages. Fewer people are willing to sit through a long video, compared to a short one, so it is logical to have more people fall off throughout the video. But if you measure the average view duration in terms of minutes, not percentages, your results are really impressive: AI 2027 and MechaHitler have AVDs of ca 10 minutes and 13.5 minutes, respectively.

Curated and popular this week
Relevant opportunities