All of Oliver Sourbut's Comments + Replies

A nit

lifestyle supports the planet, rather than taking from it

appeals to me, I'm sure to some others, but (I sense) could come across with a particular political-tribal flavour, which you might want to try neutralising. (Or not! if that'd detract from the net appeal)

2
Myles Stremick
This reminds me of when I pitched this idea to my brother and he said "I don't like the framing that a human life is negative and needs to be made up for." and I clarified "I think on the whole our lives are good and nothing to be ashamed about, but there are particular areas of our life that cause harm that we can make up for. Not that the whole life needs to be made up for."  I do think having some but not an overwhelming amount of guilt-based messaging is useful here, but I'll reconsider this line and see if it's too much. 

On point 1 (space colonization), I think it's hard and slow! So the same issue as with bio risks might apply: AGI doesn't get you this robustness quickly for free. See other comment on this post.

I like your point 2 about chancy vs merely uncertain. I guess a related point is that when the 'runs' of the risks are in some way correlated, having survived once is evidence that survivability is higher. (Up to an including the fully correlated 'merely uncertain' extreme?)

For clarity, you're using 'important' here in something like an importance x tractability x neglectedness factoring? So yes more important (but there might be reasons to think it's less tractable or neglected)?

2
Toby_Ord
Yeah, I mean 'more valuable to prevent', before taking into account the cost and difficulty.

I've been meaning to write something about 'revisiting the alignment strategy'. The section 5 here ('Won't AGI make post-AGI catastrophes essentially irrelevant?') makes the point very clearly:

On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.

But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems.

without making much of a case for it. Interested in Will and reviewers' sense of the space and literature here.

2
Toby_Ord
I've often been frustrated by this assumption over the last 20 years, but don't remember any good pieces about it. It may be partly from Eliezer's first alignment approach being to create a superintelligent sovereign AI, where if that goes right, other risks really would be dealt with.

Yep, definitely for me 'big civ setbacks are really bad' was already baked in from the POV of setting bad context for pre-AGI-transition(s) (as well as their direct badness). But while I'd already agreed with Will about post-AGI not being an 'end of history' (in the sense that much remains uncertain re safety), I hadn't thought through the implication that setbacks could force a rerun of the most perilous transition(s), which does add some extra concern.

A small aside: some put forth interplanetary civilisation as a partial defence against either of total destruction and 'setback'. But reaching the milestone of having a really robustly interplanetary civ might itself take quite a long time after AGI - especially if (like me) you think digital uploading is nontrivial.

(This abstractly echoes the suggestion in this piece that bio defence might take a long time, which I agree with.)

3
William_MacAskill
I agree with this. One way of seeing that is how many doublings of energy consumption civilisation can have before it needs to move beyond the solar system? The answer to that is about 40 doublings. Which, depending on your views on just how fast explosive industrial expansion goes, could be a pretty long time, e.g. decades.

Some gestures which didn't make the cut as they're too woolly or not quite the right shape:

  • adversarial exponentials might force exponential expense per gain
    • e.g. combatting replicators
    • e.g. brute forcing passwords
  • many empirical 'learning curve' effects appear to consume exponential observations per increment
    • Wright's Law (which is the more general cousin of Moore's Law) requires exponentially many production iterations per incremental efficiency gain
    • Deep learning scaling laws appear to consume exponential inputs per incremental gain
    • AlphaCode and A
... (read more)

This is lovely, thank you!

My main concern would be that it takes the same very approximating stance as much other writing in the area, conflating all kinds of algorithmic progress into a single scalar 'quality of the algorithms'.

You do moderately well here, noting that the most direct interpretation of your model regards speed or runtime compute efficiency, yielding 'copies that can be run' as the immediate downstream consequence (and discussing in a footnote the relationship to 'intelligence'[1] and the distinction between 'inference' and training compute... (read more)

Glad to hear it! Any particular thoughts or suggestions? (Consider applying, or telling colleagues and friends you think would be a good fit!)

On this note, the Future of Life Foundation (headed by Anthony Aguirre, mentioned in this post) is today launching a fellowship on AI for Human Reasoning.

Why? Whether you expect gradual or sudden AI takeoff, and whether you're afraid of gradual or acute catastrophes, it really matters how well-informed, clear-headed, and free from coordination failures we are navigating into and through AI transitions. Just the occasion for human reasoning uplift!

12 weeks, $25-50k stipend, mentorship, and potential pathways to future funding and impact. Applications close June 9th.

(cross-posted on LW)

Love this!

As presaged in our verbal discussion my top conceptual complement would be to emphasise exploration/experimentation as central to the knowledge production loop - the cycle of 'developing good taste to plan better experiments to improve taste (and planning model)' is critical (indispensable?) to 'produce new knowledge which is very helpful by the standards of human civilization' (on any kind of meaningful timescale).

This because just flailing, or even just 'doing stuff', gets you some novelty of observations, but directedly see... (read more)

I like this decomposition!

I think 'Situational Awareness' can quite sensibly be further divided up into 'Observation' and 'Understanding'.

The classic control loop of 'observe', 'understand', 'decide', 'act'[1], is consistent with this discussion, where 'observe'+'understand' here are combined as 'situational awareness', and you're pulling out 'goals' and 'planning capacity' as separable aspects of 'decide'.

Are there some difficulties with factoring?

Certain kinds of situational awareness are more or less fit for certain goals. And further, the important 're... (read more)

A little followup:

I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.

I'm now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I'll finish my PhD in the end, as a result, but I don't regret it: I produced a little research, met some great collaborators, and had fun w... (read more)

FWIW I work at the AI Safety Institute UK and we're considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I... don't fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!

Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).

How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?

I've been bought into that idea for a long time. As Shaq says, 'Excellence is not a singular act, but a habit. You are what you repeatedly do.'

We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).

6
Stefan_Schubert
It doesn't just say that virtue ethics is an effective consequentialist heuristic (if it says that) but also has a specific theory about the importance of altruism (a virtue) and how to cultivate it. There's not been a lot of systematic discussion on which specific virtues consequentialists or effective altruists should cultivate. I'd like to see more of it. @Lucius Caviola and I have written a paper where we put forward a specific theory of which virtues utilitarians should cultivate. (I gave a talk along similar lines here.) We discuss altruism but also five other virtues.

Simple, clear, thought-provoking model. Thanks!

I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it's worthwhile bringing in volunteers and training them! (citation very much needed)

Nitpick: are these 'externalities'? I'd have said, 'side effects'. An externality is a third-party impact from some interaction between two parties. The effects you're describing don't seem to be distinguished by being third-party per se (I can imagine glossing them as such but it's not central or necessary to the model).

2
Larks
Interesting argument about 'side effects' vs 'externalities'. I was assuming that organizations/individuals were being 'selfishly' rational, and assuming that a relatively small fraction of things like the field-building effects would benefit the specific organization doing the field-building. But 'side effects' does seem like it might be more accurate, so possibly I should adjust the title.

Yeah. I also sometimes use 'extinction-level' if I expect my interlocutor not to already have a clear notion of 'existential'.

Point of information: at least half the funding comes from Schmidt futures (not OpenAI), though OpenAI are publicising and administrating it.

Another high(er?) priority for governments:

  • start building multilateral consensus and preparations on what to do if/when
    • AI developers go rogue
    • AI leaked to/stolen by rogue operators
    • AI goes rogue

I think this is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. What follows is a mildly heated response I had a few days ago, whose heatedness I don't necessarily endorse but whose content seems important to me.

Sadly this letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. I recently wrote an admonishment to CAIS for something similar.

The biggest disadvantage of pausing for a long

... (read more)

I think that the best work on AI alignment happens at the AGI labs

Based on your other discussion e.g. about public pressure on labs, it seems like this might be a (minor?) loadbearing belief?

I appreciate that you qualify this further in a footnote

This is a controversial view, but I’d guess it’s a majority opinion amongst AI alignment researchers.

I just wanted to call out that I weakly hold the opposite position, and also opposite best guess on majority opinion (based on safety researchers I know). Naturally there are sampling effects!

This is a margi... (read more)

1
AnonResearcherMajorAILab
Yes, if I changed my mind about this I'd have to rethink my position on public advocacy. I'm still pretty worried about the other disadvantages so I suspect it wouldn't change my mind overall, but I would be more uncertain.

This is an exemplary and welcome response: concise, full-throated, actioned. Respect, thank you Aidan.

Sincerely, I hope my feedback was all-considered good from your perspective. As I noted in this post, I felt my initial email was slightly unkind at one point, but I am overall glad I shared it - you appreciate my getting exercised about this, even over a few paragraphs!

It’s important to discuss national AI policies which are often explicitly motivated by goals of competition without legitimizing or justifying zero-sum competitive mindsets which can unde

... (read more)

(Prefaced with the understanding that your comment is to some extent devil's advocating and this response may be too)

both the US and Chinese governments have the potential to step in when corporations in their country get too powerful

What is 'step in'? I think when people are describing things in aggregated national terms without nuance, they're implicitly imagining govts either already directing, or soon/inevitably appropriating and directing (perhaps to aggressive national interest plays). But govts could just as readily regulate and provide guidance... (read more)

Thanks Ben!

Please don't take these as endorsements that this thinking is correct, just that it's what I see when I inspect my instincts about this

Appreciated.

These psychological (and real) factors seem very plausible to me for explaining why mistakes in thinking and communication are made.

maybe we can think of the US companies as simultaneously closer friends and closer enemies with each other?

Mhm, this seems less lossy as a hypothetical model. Even if they were only 'closer friends', though, I don't think it's at all clearcut enough for it to be a... (read more)

Just in case we're out of sync, let's briefly refocus on some object details

China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.

Are you aware of the following?

  • the smuggling was done by... smugglers
  • the buying of chips under the limit was done by multiple suppliers in China
  • the selling of chips under the limit was done by Nvidia (and perhaps others)
  • the investment in China's chip industry was done by the CCP

If... (read more)

1
Gerald Monroe
What makes it a foregone conclusion is the powerful nature of race dynamics are convergent.  Actions that would cause a party to definitely lose a race have feedback.  Over time multiple competing agents will choose winning strategies, and others will copy those, leading to strategy mirroring.  Certain forms of strategy (like nationalizing all the AI labs) are also convergent and optimal.  And see a party could fail to play optimally, then observe they are losing, and be forced to choose optimal play in order to lose less. So my seeming overconfidence is because I am convinced the overall game will force all these disparate uncoordinated individual events to converge on what it must.   I expect there are several views, but let's look at the bioweapon argument for a second.  In what computers can the "escaped" AI exist in?  There is no biosphere of computers.  You need at least (1600 Gb x 2 / 80 x 2) = 80 H100s to host a GPT-4 instance.  The real number is rumored to be about 128.   And that's a subhuman AGI at best without vision and other critical features. How many cards will a dangerous ASI need to exist?  I won't go into the derivation here but I think the number is > 10,000, and they must be in a cluster with high bandwidth interconnects.   As for the second part, "how are we going to use it as a stick".  Simple.  If you are unconcerned with the AI "breaking out", you train and try a lot of techniques, and only use "in production" (industrial automation, killer robots etc) the most powerful model you have that is measurably reliable and efficient and doesn't engage in unwanted behavior.   None of the bad AIs ever escape the lab, there's nowhere for them to go. Note that might be a different story in 2049, that would be when Moore's law would put a single GPU at the power of 10,000 of them.   It likely can't continue that long, exponentials stop, but maybe computers built with computronium printed off a nanoforge. But we don't have any of that, and won

Interesting. I'd love to know if you think the crux schema I outlined is indeed important? I mean this:

How quickly/totally/coherently could US gov/CCP capture AI talent/artefacts/compute within its jurisdiction and redirect them toward excludable destructive ends? Under what circumstances would they want/be able to do that?

Correct me at any point if I misinterpret: I read that, on the basis of answers to something a bit like these, you think an international competition/race is all but inevitable? Presumably that registers as terrifically dangerous for... (read more)

-7
Gerald Monroe

Thanks for this thoughtful response!

this tendency leads to analysis that assumes more coordination among governments, companies, and individuals in other countries than is warranted. When people talk about "the US" taking some action... more likely to be aware of the nuance this ignores... less likely to consider such nuances when people talk about "China" doing something

This seems exactly right and is what I'm frustrated by. Though, further than you give credit (or un-credit) for, frequently I come across writing or talking about "US success in AI", "... (read more)

3
Daniel_Eth
  I'm pretty sure what most (educated) people think is they are part of the US (in the sense that they are "US entities", among other things), that they will pay taxes in the US, will hire more people in the US than China (at least relative to if they were Chinese entities), will create other economic and technological spillover effects in greater amount in the US than in China (similar to how the US's early lead on the internet did), will enhance the US's national glory and morale, will provide strategically valuable assets to the US and deny these assets to China (at least in a time of conflict), will more likely embody US culture and norms than Chinese culture and norms, and will be subject to US regulation much more than Chinese regulation. Most people don't expect these companies will be nationalized (though that does remain a possibility, and presumably more so if they were Chinese companies than US companies, due to the differing economic and political systems), but there are plenty of other ways that people expect the companies to advantage their host country['s government, population, economy, etc].
2
Gerald Monroe
Yes.   In the end, all the answers to your questions are yes. The critical thing to realize is until basically EOY 2022, AI didn't exist.  It was narrow and expensive and essentially non-general - a cool party trick but the cost to build a model for anything and get to useful performance levels was high.  Self driving cars were endlessly delayed, Recsys work but their techniques to correlate fields of user data with preferences are only a little better using neural networks than older cheaper methods, for most other purposes AI was just a tech demo. You need to think in terms of "what does it means that AI works now and how are decisions going to be different".  With that said, governments won't nationalize AI companies until they develop a lot stronger models. Imagine the Manhattan project never happened, but GE and a few other US companies kept tinkering with fission.  Eventually they would have build critical devices, and EOY 2022 is the "Chicago pile" moment - there's a nuclear reactor, and we can plot out the yield for a nuke, but the devices have not yet been built. Around the time GE is building nuclear bombs for military demos, at some point the US government has to nationalize it all.  It's too dangerous.     As for the rest of your post, i don't see how "non framing a competition as a competition" is very useful.  It's not the media.  We live on a finite planet with finite resources, and the only reason there are different countries is the most powerful winners have not found a big enough club to conquer everyone else. You know nations used to be way smaller, right.  Why do you think they are so large now?  In each history someone found a way to depose all the other feudal kings and lords. AGI may be that club, and whoever builds it fastest and bestest may in fact just be able to crush everyone.  Even if they can't, each superpower has to assume that they can.

Great read, and interesting analysis. I like encountering models for complex systems (like community dynamics)!

One factor I don't think was discussed (maybe the gesture at possible inadequacy of encompasses this) is the duration of scandal effects. E.g. imagine some group claiming to be the Spanish Inquisition or the Mongol Horde, or the Illuminati tried to get stuff done. I think (assuming taken seriously) they'd encounter lingering reputational damage more than one year after the original scandals! Not sure how this models out; I'm not planning to d... (read more)

2
Ben_West🔸
Thanks Oliver! It seems basically right to me that this is a limitation of the model, in particular f(N), like you say.

OpenAI as a whole, and individuals affiliated with or speaking for the org, appear to be largely behaving as if they are caught in an overdetermined race toward AGI.

What proportion of people at OpenAI believe this, and to what extent? What kind of observations, or actions or statements by others (and who?) would change their minds?

Great post. I basically agree, but in a spirit of devil's advocating, I will say: when I turn my mind to agent foundations thinking, I often find myself skirting queasily close to concepts which feel also capabilities-relevant (to the extent that I have avoided publicly airing several ideas for over a year).

I don't know if that's just me, but it does seem that some agent foundations content from the past has also had bearing on AI capabilities - especially if we include decision theory stuff, dynamic programming and RL, search, planning etc. which it's arg... (read more)

Thank you for sharing this! Especially the points about relevant maps and Meta/FAIR/LeCun.

I was recently approached by the UK FCDO as a technical expert in AI with perspective on x-risk. We had what I think were very productive conversations, with an interesting convergence of my framings and the ones you've shared here - that's encouraging! If I find time I'm hoping to write up some of my insights soon.

1
Oliver Sourbut
I wrote a little here about unpluggability (and crossposted on LessWrong/AF)

I've given a little thought to this hidden qualia hypothesis but it remains very confusing for me.

To what extent should we expect to be able to tractably and knowably affect such hidden qualia?

3
Adam Shriver
Here's the report on conscious subsystems: https://forum.effectivealtruism.org/posts/vbhoFsyQmrntru6Kw/do-brains-contain-many-conscious-subsystems-if-so-should-we 

This is beautiful and important Tyler, thank you for sharing.

I've seen a few people burn out (and come close myself), and I have made a point of gently socially making and reinforcing this sort of point (far less eloquently) myself, in various contexts. 

I have a lot of thoughts about this subject.

One thing I embrace always is silliness and (often self-deprecating) humour, which are useful antidotes to stress for a lot of people. Incidentally, your tweet thread rendition of the Eqyptian spell includes

I am light heading for light. Even in the dark, a fi

... (read more)
7
tyleralterman
Agree so much with the antidote of silliness! I’m happy to see that EA Twitter is embracing it. Excited to read the links you shared, they sound very relevant. Thank you, Oliver. May your fire bum into the distance.

Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.

I think it's a staircase? Maybe like climbing upwards to more good stuff. Plus some cool circles to make it logo ish.

2
Zach Stein-Perlman
“Abstract stairs” was my best guess too. It doesn’t work for me, and I don’t get the second circle.

I'm intrigued by this thread. I don't have an informed opinion on the particular aesthetic or choice of quiz questions, but I note some superficial similarities to Coursera, Khan Academy, and TED-Ed, which are aimed at mainly professional age adults, students of all ages, and youth/students (without excluding adults) respectively.

Fun/cute/cartoon aesthetics do seem to abound these days in all sorts of places, not just for kids.

My uninformed opinion is that I don't see why it should put off teenagers (talented or otherwise) in particular, but I weakly agree that if something is explicitly pitched at teenagers, that might be offputting!

It looks like I got at least one downvote on this comment. Should I be providing tips of this kind in a different way?

I've considered a possible pithy framing of the Life Despite Suffering question as a grim orthogonality thesis (though I'm not sure how useful it is):

We sometimes point to the substantial majority's revealed preference for staying alive as evidence of a 'life worth living'. But perhaps 'staying-aliveness' and 'moral patient value' can vary more independently than that claim assumes. This is the grim orthogonality thesis.

An existence proof for the 'high staying-aliveness x low moral patient value' quadrant is the complex of torturer+torturee, which quite cl... (read more)

I'm shocked and somewhat concerned that your empirical finding is that so few people have encountered or thought about this crucial consideration.

My experience is different, with maybe 70% of AI x-risk researchers I've discussed with being somewhat au fait with the notion that we might not know the sign of future value conditional on survival. But I agree that it seems people (myself included) have a tendency to slide off this consideration or hope to defer its resolution to future generations, and my sample size is quite small (a few dozen maybe) and quit... (read more)

9
Jacy
This is helpful data. Two important axes of variation here are: - Time, where this has fortunatley become more frequently discussed in recent years - Involvement, where I speak a lot with artificial intelligence and machine learning researches who work on AI safety but not global priorities research; often their motivation was just reading something like Life 3.0. I think these people tend to have thought through crucial considerations less than, say, people on this forum.

My anecdata is also that most people have thought about it somewhat, and "maybe it's okay if everyone dies" is one of the more common initial responses I've heard to existential risk.

But I agree with OP that I more regularly hear "people are worried about negative outcomes just because they themselves are depressed" than "people assume positive outcomes just because they themselves are manic" (or some other cognitive bias).

Typo hint:

"10<sup>38</sup>" hasn't rendered how you hoped. You can use <dollar>10^{38}<dollar> which renders as

1
Oliver Sourbut
It looks like I got at least one downvote on this comment. Should I be providing tips of this kind in a different way?
2
Fai
Maybe another typo? : "Bostrom argues that if humanizes could colonize the Virgo supercluster", should that be "humanity" or "humans"?
1
Jacy
Whoops! Thanks!

Got it, I think you're quite right on one reading. I should have been clearer about what I meant, which is something like

  • there is a defensible reading of that claim which maps to some negative utilitarian claim (without necessarily being a central example)
  • furthermore I expect many issuers of such sentiments are motivated by basically pretheoretic negative utilitarian insight

E.g. imagine a minor steelification (which loses the aesthetic and rhetorical strength) like "nobody's positive wellbeing (implicitly stemming from their freedom) can/should be cel... (read more)

5
abrahamrowe
That makes sense to me. Yeah, I definitely think that also many people from left-leaning spaces who come to EA also become sympathetic to suffering focused work in my experience, which also seems consistent with this.

Minor nitpick: "nobody's free until everyone is free" is precisely a (negative) utilitarian claim (albeit with unusual wording)

9
abrahamrowe
That doesn't seem quite right - negative utilitarians would still prefer marginal improvements even if all suffering didn't end (or in this case, a utilitarian might prefer many become free even if all didn't become free). The sentiment is interesting because it doesn't acknowledge marginal states that utilitarians are happy to compare against ideal states, or worse marginal states.

It's possible the selection bias is high, but I don't have good evidence for this besides personal anecdata. I don't know how many people are relevantly similar to me, and I don't know how representative we are of the latest EA 'freshers', since dynamics will change and I'm reporting with several years' lag.

Here's my personal anecdata.

Since 2016, around when I completed undergrad, I've been an engaged (not sure what counts as 'highly engaged') longtermist. (Before that point I had not heard of EA per se but my motives were somewhat proto EA and I wanted to... (read more)

1
Anonymous_EA
Appreciate the anecdata! I agree that probably there are at least a good number of people like you who will go under the radar, and this probably biases many estimates of the number of non-community-building EAs downward (esp estimates that are also based on anecdata, as opposed to e.g. survey data).

I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.

The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and th... (read more)

I wrote something similar (with more detail) about the Gato paper at the time.

I don't think this is any evidence at all against AI risk though? It is maybe weak evidence against 'scaling is all you need' or that sort of thing.

Thanks Rohin, I second almost all of this.

Interested to hear more about why long-term credit assignment isn't needed for powerful AI. I think it depends how you quantify those things and I'm pretty unsure about this myself.

Is it because there is already loads of human-generated data which implicitly embody or contain enough long-term credit assignment? Or is it that long-term credit assignment is irrelevant for long-term reasoning? Or maybe long-term reasoning isn't needed for 'powerful AI'?

3
Rohin Shah
We're tackling the problem "you tried out a long sequence of actions, and only at the end could you tell whether the outcomes were good or not, and now you have to figure out which actions ". Some approaches to this that don't involve "long-term credit assignment" as normally understood by RL practitioners: * Have humans / other AI systems tell you which of the actions were useful. (One specific way this could be achieved is to use humans / AI systems to provide a dense reward, kinda like in summarizing books from human feedback.) * Supervise the AI system's reasoning process rather than the outcomes it gets (e.g. like chain-of-thought prompting but with more explicit supervision). * Just don't even bother, do regular old self-supervised learning on a hard task; in order to get good performance maybe the model has to develop "general intelligence" (i.e. something akin to the algorithms humans use in order to do long-term planning; after all our long-term planning doesn't work via trial and error). I think it's also plausible that (depending on your definitions) long-term reasoning isn't needed for powerful AI.
Load more