This is a special post for quick takes by MathiasKB. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since: Today at 4:54 PM

Does Claude-3 push capabilities?

I think it can be a fun exercise is to just interpret CEOs statements literally and see what they imply.

If Dario Amodei claims they don't want to push capabilities, I think an interesting question to ask is in what sense releasing the world's best LLM isn't pushing capabilities.

One option that seems possible to me, could be that they no longer consider releasing improved LLMs to meaningfully push the frontier. If Claude-3 spurs OpenAI to push a quicker release of GPT-4.5, this would not be an issue as releasing ever more refined LLMs doesn't meaningfully increase any specific risk.

In that case they would still be true to their word, Anthropic just no longer considers LLMs on their own to be the frontier of capabilities. This seems reasonable to me?

It certainly does seem to push capabilities, although one could argue about whether the extent of it is very significant or not.

Being confused and skeptical about their adherence to their stated philosophy seems justified here, and it is up to them to explain their reasoning behind this decision.

On the margin, this should probably update us towards believing they don't take their stated policy of not advancing the SOTA too seriously.

I think the answer is 'yes' for a general layperson's understanding of 'pushing capabilities', but the emerging EA discourse on this seems to be at risk on inflating several questions:

  1. Has Claude-3 shown better capability than other models? Yes under certain specific conditions and benchmarks
  2. Do those benchmarks matter/actually capture performance of interest? No, in my opinion. I'd recommend reading Melanie Mitchell's takes on this.
  3. Does Claude-3's extra capabilities make it more likely to cause an x-risk event? No, or at least the probability that the current frontier AI model will cause an x-risk event has gone from ~epsilon to ~epsilon
  4. Will Claude-3's release increase or decrease x-risk? Very difficult to say, I don't know how people get over cluelessness objections to these questions.

So I guess in your post 'frontier' is covering 2 separate concepts, the 'frontier' in terms of published benchmarks and the 'fronitier' in terms of marginal x-risk increase. In my opinion, Claude-3 may be an interesting case where these come apart.

I've noticed a decrease in the quality and accuracy of communication among people and organizations advocating for pro-safety views in the AI policy space. More often than not, I'm seeing people go with the least charitable interpretations of various claims made by AI leaders.

Arguments are increasingly looking like soldiers to me.

Take the following twitter thread from Dr. Peter S. Clark describing his new paper co-authored with Max Tegmark.

The authors use game theory to justify a slew of normative claims that don't follow. The choice of language makes refutations difficult and pollutes epistemic commons. For example, they choose terms such as 'pro-human' and include parameters such as 'naivety'.

These are rhetorical sleights of hand. Arguing for the benefits from automation to the consumer is now anti-human! You wouldn't want to be naive and anti-human now, would you?

I don't want this to be an attack on those who are against further AI development. PauseAI is a great example of what open and honest advocacy can look like. Being a vocal advocate for a cause is fine, disguising opinion as fact is not!

Any suggestions for improving this?

Why is it that I must return from 100% of EAGs with either covid or a cold?

Perhaps my immune system just sucks or it's impossible to avoid due to asymptomatic cases, but in case it's not: If you get a cold before an EAG(x), stay home!

For those who do this already, thank you! 

If it's just a cold, or you're testing negative for COVID but still have mild symptoms, I think it should be okay to attend wearing a mask indoors and distanced outside, and eating outside or alone. I did this once for an EAG under the advice of the team with only and multiple negative tests and mild symptoms. I also checked with each of my 1-on-1s if they were still okay meeting and how, and (maybe excessively, and probably not what the team expected) skipped almost all group events and talks I had originally planned to attend. Part of the reason I skipped group events and talks was because I wouldn't be able to check with everyone if they were comfortable with me attending.

That being said, I felt pretty self-conscious attending, which was unpleasant, but I also had good 1-on-1s, as well as good interactions outside of the formal events.

I would strongly urge people to err on the side of attendance. The value of the connections made at EAGs and EAGxs far exceeds the risks posed by most communicable diseases, especially if precautions are taken, such as wearing a mask. 

If you take seriously the value of connections, many of them could very well exceed the cost to save a life. Would you say that your avoiding a cold is worth the death of someone in the developing world, for instance? I think your request fails to take seriously the value of making connections within the EA community.

The minute suffering I experience from the cold is not the real cost!

I'm probably an outlier, given that a lot of my work is networking but I have had to cancel attending an event where I was invited to speak and likely would have met at least a few people who would have been relevant to know for my work, canceled an in-person meeting (though I likely will get a chance to meet them later) and reschedule a third.

The cold probably hit at the best possible time (right after two meetings in parliament), had it come sooner it would have really sucked.

Additionally my roommate, who is an EA, has now gotten sick affecting his work as well.

(Come to think of it, I think it's actually more likely I got sick from somewhere else than eagx this time, and eagx just happened to be last weekend)

Do a large proportion of people come back from EAGs infected with a variant of COVID, relative to other large gatherings?

Why you should buy a desk treadmill

I've wanted to do a 'things I recommend you buy' list for a while, but I think my purchase of a desk-treadmill has been so much higher value than any other purchase I've made that I instead will make the case for only this item.

I've never liked standing desks. When I stand my feet get restless to the point of it being so distracting that I have to sit back down if I want to focus. Half a year ago, I got to try a desk treadmill and I immediately bought one for myself.

Since February I've been walking on this desk treadmill while working. I went from getting an average of <3000 steps a day, to an average of 17,000 daily steps (my day record is 46,000). It's been the single easiest improvement to my health I've made, and I highly recommend you get one as well.

I was fortunate to try it out before buying, but my biggest initial worries were:

  • Whether I would actually use it, or it would just sit and collect dust
  • Whether it would affect my productivity

Since buying it my home office desk has been permanently raised and I haven't used my office chair since. If I'm working or playing video games, I'm walking. When my feet need to rest, I take my laptop to the kitchen.

I also found no negative effects on my productivity. Unlike desk exercise bikes where people tell me they forget to pedal effectively making it an uncomfortable chair, with a desk treadmill my brain just goes 'oh I'm walking now' and then I entirely stop thinking about it until I'm out of flow state and I notice I've walked 5000 steps and could use a quick break.

I pushed my brother into buying one, and he's been very happy with his as well.

tips and tricks:

  • I put mine on a homemade tennis-ball platform, to avoid my downstairs neighbour hearing my footsteps.
  • You can't hear the engine through walls or while wearing noise-canceling headphones, but I suspect it would be too loud for an open office.
  • I would get a bit dizzy after stepping off the treadmill the first week, but this quickly went away
  • If you walk all day, you will be really sore after a few days. This doesn't happen to me anymore, but i'm also 25 so your experience may differ.
  • I switch from running shoes to sandals once a day to avoid my feet blistering.

I also found no negative effects on my productivity.

This makes it sound to me like you think most the value comes from the health/fitness benefits of generally being less sedentary during a working day; and less to no value comes from potential benefits to focus or productivity (except insofar as they're downstream of being healthier). Is that a fair summary?

Yes entirely in my case, no noticeable benefits to productivity.

Interesting suggestion, thanks for sharing. Do you think it would be comfortable barefoot?

I've walked barefoot, but I've found that my feet get tired much quicker than when I wear running shoes. Sandals are a nice medium for me.

The main concern I would have walking barefoot in the long run, would be damaging my knees.

I got a cheap ish one off Amazon a few years back but I noticed it was melting holes in the the plasticky mat I put it on. I didn’t try to fix this. Does yours get very hot on the underside?

haven't noticed anything like that. It gets warm near the engine, but not remotely hot enough to melt anything.

I can see this work extremely well for some tasks like reading and meetings, but significantly less so for stuff like typing. What's your experience with those types of tasks? Is walking on a treadmill disrupting these a lot or make them substantially more difficult? Or do you exclusively write and respond to emails when you're sitting down?

I don't find this to be much of an issue. Generally, the higher the walking speed the more difficult things become.

At 3.5 km/h which is what I've configured to be the default speed, the only thing I've found to be an issue is reading text that is small. I solve this by just zooming in. My ability to accurate write and click things is not affected.

If I turn the speed down to 2.5 km/h, reading small text is also no problem for me.

There has been a tremendous amount of discussion and conflict in the past months over the state of Effective Altruism as a movement. For good reason too. SBF, someone I was once proud to highlight as a shining example of what EA had to bring, looks to have have committed one of history's largest instances of fraud. I would be concerned if we weren't in heated debates over what lessons to take from this!

After spending a few too many hours (this week has not been my most productive) reading through much of the debate, I noticed something: I'm still proud to call myself an effective altruist and my excitement is as high as ever.

If all of EA disappeared tomorrow, I would continue on my merry way trying to make things better. I would continue spending my free time trying to build a community in Denmark of people interested in spending their time and resources to do good as effectively as possible.

What brought me to EA was an intrinsic motivation to do as much good as possible, and nothing any other effective altruist can do is going to change this motivation.

I'm happy to consider anyone who shares that objective to be a friend, even if we don't on the specifics of how exactly one should go about trying to do the most good. "Doing the most good" is a pretty nebulous concept after all. I would find it pretty weird if we all agreed completely on what that implies.

Remember:

- We're all on the same team
- We're all just human beings try to do the best we can
- We're all acting on imperfect information

Spreadsheets are in many ways a force-multiplier of all other work that one does. For that reason I am very happy to have invested significant time into becoming good at utilizing spreadsheets in the work I do.

Over the past months, I’ve increasingly started using GPT in my workflow and am starting to see it as a tool that similarly to spreadsheets can make one better across a vide variety of tasks.

It wasn’t immediately useful however! It was only with continuous practice that it started generating actual value.

It took me a while to get good at noticing when some task I was doing could be sped up by involving GPT, but especially for brainstorming or listing things it does in seconds what would take me hours. I highly recommend investing time it takes to get it into your workflow. It takes time to build an intuition of what it can and cannot do well.

For example, my org spent some hours creating a list of organizations that currently attempt to influence aid spending in our target country. I asked GPT what organizations we had missed and in seconds was able to add an additional 15 organizations onto the list we had overlooked.

The amount of tasks we can outsource to AI will only increase going forward, and I think those who invest time into getting good at utilizing the new wave of AI tools will be able to multiply productivity significantly and will be at an advantage over those who don't.

Can you share any other examples of what you've asked?  Feeling somewhat uncreative on how to apply LLMs to day-to-day work!

Sure, unfortunately GPT-4 doesn't seem to save the chat histories properly, but the most recent three by memory (topics obfuscated):

Write out paragraph showing how <intervention> will help <target country> <target org's priorities>.

Failure: GPT replies bloated text that makes the argument, but is too weasle-worded. Would be more work to rewrite than just do from scratch.

 

Format following into list with:
<name> <occupation>

[messy content I had copied from website including the names and occupations along with other html stuff between]

Success: GPT replied with all names in the right format easy to copy paste into google sheets.

 

What are top ten newspapers in <target country> ranked by political influence

Success: GPT replied with reasonable looking top ten list including a description of their political orientation

 

 

One I often find myself asking and getting great answers to is:

Write sheets function that <does thing I need to do>

I also often use gpt to get brainstorms started.

My org is trying to achieve <thing>, list ten ways we could go about this.

Signal boosting my twitter poll, which I am very curious to have answered:

https://twitter.com/BondeKirk/status/1758884801954582990

Basically the question I'm trying to get at is whether having hands-on experience training LLMs (proxy for technical expertise) makes you more or less likely to take existential risks from AI seriously.

Does anyone have advice on getting rid of material desire?

Unlike many I admire I seem to have a much larger desire to buy stuff I don't need. For example I currently feel an overpowering urge to spend $100 on a go board, despite the fact that I little need for one.

I'm not arguing that I have some duty to live frugally due to EA, I just would prefer to be a version of myself that doesn't feel the need to spend money on as much stupid stuff.

If spending a bit of money is ok, you can implement the policy of throwing away things you don't need. Then after a few cycles of buy thing -> receive thing -> throw away thing you'll be deconditioned from buying useless things.

Most purchases I on reflection would prefer not to make are purchases where what I would receive would be worth much more than nothing but still less than the asking price, so I would never actually be compelled to throw out the superfluous stuff I buy.

Many times the purchase would even be worth more than the asking price, but I would like for my preferences to change such that it no longer would be the case.

If a bhikkhu monk can be content owning next to nothing, surely I can be happy owning less than I currently do. The question is how I change my preferences to become more like that of the monk.

The underlying desire of most addictive tendencies in our production/consumption culture is the desire to feel more connected with a tribe (Maslow’s love and belonging). We are—at our core—social creatures. Our ancestors reinforced connections with tribe mates every day, and they clearly knew the values they shared with the tribe. They were living life within the parameters in which we evolved to thrive.

In our society the tribes have been disbanded in favor of a more interconnected world, and likewise values have become diffuse and harder for individuals to know what they truly believe in. Just like throwing 20k chickens into a barn causes them to go crazy and peck one another to death because their brains can’t handle a pecking order that big, so too is it with humans who are not able to instinctively operate in such a vastly more complex and relationally fluid world where the environment has changed so radically from tribal days.

Invest in a few (3-5) deeply intimate relationships where you know you are equals and can be there unconditionally and without judgment for each other. As Robin Dunbar says in his excellent book “Friends”:

It was the social measures that most influenced your chances of surviving… The best predictors were those that contrasted high versus low frequencies of social support and those that measured how well integrated you were into your social network and your local community. Scoring high on these increased your chances of surviving by as much as 50 per cent… it is not too much of an exaggeration to say that you can eat as much as you like, drink as much alcohol as you want, slob about as much as you fancy, fail to do your exercises and live in as polluted an atmosphere as you can find, and you will barely notice the difference… You will certainly do yourself a favor by eating better, taking more exercise and popping the pills they give you, but you’ll do considerably better just by having some friends.

Also see Robert Waldinger’s TED talk on the Grant study.

Has anyone managed to get any use out of gpt-4 integrations yet? I've tried to set up integrations into my private spreadsheets with Zapier, but the painfully slow speed at which gpt-4 writes and needing to click a link to confirm every action makes any small ask slower than just doing it myself.

So far I've been pretty disappointed, but maybe I'm just steering myself blind on tasks that it's currently not well suited for.

I really really (and I cannot emphasize this enough) really dislike writing applications. It gives me a feeling of despair and inadequacy about my career and life choices. Due to this I write much fewer applications than I should be, and spend too much time and energy on the few I do send.

I generally feel confident about myself but writing applications for some reason really messes me up.

Has anyone here dealt with anxiety when writing applications? If so, how did you overcome it?

I think a lot of people feel this way, and it's something I've experienced. I don't have any great solutions but I generally do two things:

  1. Set reasonable expectations. The application process has a lot of randomness, and almost all applications will get ignored even if they're good, so I should expect any particular application to have a very low chance of getting a response.
  2. Spend less time on individual applications; apply to a lot of things; use commonalities across applications to copy/paste things I wrote on previous applications.

I believe AI will significantly decrease the cost of overregulation, and make many policies attractive that previously were too costly to administrate.

Read why I believe this in my new substack, which I'm trying to start so I have a place to write about non-EA stuff!

https://mkbaio.substack.com/p/ai-will-make-regulation-much-less

I wrote down a list of all the things I could spend one hour every day doing. Among high scorers was teaching myself Mandarin.

Has anyone looked into the value of learning Mandarin, for the average person disinterested in China?

Some thoughts here on how quick it is to learn: https://80000hours.org/articles/china-careers/#learn-chinese-in-china

In there, I guess that 6-18 months of full-time study in the country is enough to get to conversational fluency.

I've seen other estimates that it takes a couple of thousand hours to get fluent e.g. here: https://linguapath.com/how-many-hours-learn-language/

My guess is that it's more efficient to study full time while living in the country. I think living there increases motivation, means you learn what you actually need, means you learn a bunch 'passively', and lets you practice conversation a lot, which is better than most book learning, and you learn more of the culture. So, I'd guess someone would make more progress living there for a year compared to doing an hour a day for ~4 years, and enjoy it more.

That said, if you use the hour well, you could learn a lot of vocab and grammar. You could could then get a private tutor to practice conversation, or you could go to China (or Taiwan) later building on that base.

My guess is that it's more efficient to study full time while living in the country. I think living there increases motivation, means you learn what you actually need, means you learn a bunch 'passively', and lets you practice conversation a lot, which is better than most book learning, and you learn more of the culture.

+1

Being there definitely increased my motivation to learn the language, even though I didn't know any Chinese beforehand and wasn't intending to learn any.

Why would you learn Mandarin if you're disinterested in China? What made it high scoring?

Triplebyte is a company that interviews and vets software developers, identifying their strengths and weaknesses. Triplebyte can cut down the time spent on draining interviews significantly. More importantly it makes it easy for firms to find candidates and vice-versa.

Would it be useful to have similar service for EA organisations?

It seems to me the skills EA organisations look for, seem harder to generalize than software development skills. This means centralized interviews are much less valuable.

What does seem useful is reducing the friction that arises from matching companies with candidates.

Less well known orgs could more easily find the labor they need and persons interested in direct work at EA orgs can devote their full focus on their current occupation knowing they will be visible to potential employers.

It seems the 80k job-board is already accomplishing much of this, does anyone reckon there would be demand for an expanded version of this?

At what point do feel with ~90% certainty you would have done more good by donating to animal charities than you've harmed by consuming a regular meat-filled diet?

It would be nice to know the numbers I have in my head somewhat conform to what smart people think.