All of gwern's Comments + Replies

I don't think it's odd at all. As the Bloomberg article notes, this was in response to the martyrdom of Saint Altman, when everyone thought the evil Effective Altruists were stealing/smashing OA for no reason and destroying humanity's future (as well as the fortune of many of the individuals involved, to a degree few of them bothered to disclose) and/or turning it over to the Chinese commies. An internal memo decrying 'Effective Altruism' was far from the most extreme response at the time; but I doubt Gomez would write it today, if only because so much new... (read more)

2
JWS
2mo
Ah good point that it was in the aftermath of the OpenAI board weekend, but it still seems like a very extreme/odd reaction to me (though I have to note the benefit of hindsight as well as my own personal biases). I still think it'd be interesting to see what Aidan actually said, and/or why he's formed such a negative opinion of EA, but I think your right than the simplest explanation here is:

Good report overall on tacit knowledge & biowarfare. This is relevant to the discussion over LLM risks: the Aum Shinrikyo chemist could make a lot of progress by reading papers and figuring out his problems as he went, but the bacteriologist couldn't figure out his issues for what seems like what had been a viable plan to weaponize & mass-produce anthrax but where lack of feedback led it to fail. Which does sound like something that a superhumanly-knowledgeable (but not necessarily that intelligent) LLM could help a lot with simply by pattern-matching and making lists of suggestions for things that are to the human 'unknown unknowns'.

If a crazy person wants to destroy the world with an AI-created bioweapon

Or, more concretely, nuclear weapons. Leaving aside regular full-scale nuclear war (which is censored from the graph for obvious reasons), this sort of graph will never show you something like Edward Teller's "backyard bomb", or a salted bomb. (Or any of the many other nuclear weapon concepts which never got developed, or were curtailed very early in deployment like neutron bombs, for historically-contingent reasons.)

There is, as far as I am aware, no serious scientific doubt that ... (read more)

Hm, maybe it was common knowledge in some areas? I just always took him for being concerned. There's not really any contradiction between being excited about your short-term work and worried about long-term risks. Fooling yourself about your current idea is an important skill for a researcher. (You ever hear the joke about Geoff Hinton? He suddenly solves how the brain works, at long last, and euphorically tells his daughter; she replies: "Oh Dad - not again!")

Ilya has always been a doomer AFAICT, he was just loyal to Altman personally, who recruited him to OA. (I can tell you that when I spent a few hours chatting with him in... 2017 or something? a very long time ago, anyway - I don't remember him dismissing the dangers or being pollyannaish.) 'Superalignment' didn't come out of nowhere or surprise anyone about Ilya being in charge. Elon was... not loyal to Altman but appeared content to largely leave oversight of OA to Altman until he had one of his characteristic mood changes, got frustrated and tried to tak... (read more)

5
Nick K.
5mo
Just judging from his Twitter feed, I got the weak impression D'Angelo is somewhat enthusiastic about AI and didn't catch any concerns about existential safety.

Hmm, OK. Back when I met Ilya, about 2018, he was radiating excitement that his next idea would create AGI, and didn't seem sensitive to safety worries. I also thought it was "common knowledge" that his interest in safety increased substantially between 2018-22, and that's why I was unsurprised to see him in charge of superalignment.

Re Elon-Zillis, all I'm saying is that it looked to Sam like the seat would belong to someone loyal to him at the time the seat was created.

You may well be right about D'Angelo and the others.

  1. I haven't seen any coverage of the double structure or Anthropic exit which suggests that Amodei helped think up or write the double structure. Certainly, the language they use around the Anthropic public benefit corporation indicates they all think, at least post-exit, that the OA double structure was a terrible idea (eg. see the end of this article).

  2. You don't know that. They seem to have often had near majorities, rather than being a token 1 or 2 board members.

    By most standards, Karnofsky and Sutskever are 'doomers', and Zillis is likely a 'doomer'

... (read more)
6
Habryka
5mo
I am reasonably confident Helen replaced Holden as a board member, so I don't think your 2021-12-31 list is accurate. Maybe there was a very short period where they were both on the board, but I heard the intention was for Helen to replace Holden.
6
RyanCarey
5mo
1. The main thing that I doubt is that Sam knew at the time that he was gifting the board to doomers. Ilya was a loyalist and non-doomer when appointed. Elon was I guess some mix of doomer and loyalist at the start. Given how AIS worries generally increased in SV circles over time, more likely than not some of D'Angelo, Hoffman, and Hurd moved toward the "doomer" pole over time.
gwern
5mo117
10
2
4
14
6

EDIT: this is going a bit viral, and it seems like many of the readers have missed key parts of the reporting. I wrote this as a reply to Wei Dai and a high-level summary for people who were already familiar with the details; I didn't write this for people who were unfamiliar, and I'm not going to reference every single claim in it, as I have generally referenced them in my prior comments/tweets and explained the details & inferences there. If you are unaware of aspects like 'Altman was trying to get Toner fired' or pushing out Hoffman or how Slack was... (read more)

3
Ebenezer Dukakis
5mo
Another possibility is that Sam came to see EA as an incredibly flawed movement, to the point where he wanted EAs like Toner off his board, and just hasn't elaborated the details of his view publicly. See these tweets from 2022 for example. I think Sam is corrupted by self-interest and that's the primary explanation here, but I actually agree that EA is pretty flawed. (Better than the competition, but still pretty flawed.) As a specific issue OpenAI might have with EA, I notice that EA seems significantly more interested in condemning OpenAI publicly than critiquing the technical details of their alignment plans. It seems like EAs historically either want to suck up to OpenAI or condemn them, without a lot of detailed technical engagement in between.
2[comment deleted]5mo
3
Yarrow B.
5mo
I checked the WSJ article linked to in this excerpt and I checked your comments on LessWrong, but I couldn't find any mention of Ilya Sutskever seeing Slack screenshots that showed Sam Altman lying. Would you mind clarifying?  Please forgive me if you're already covered this elsewhere.

Nitpicks:

  1. I think Dario and others would've also been involved in setting up the corporate structure
  2. Sam never gave the "doomer" faction a near majority. That only happened because 2-3 "non-doomers" left and Ilya flipped.

Thanks, I didn't know some of this history.

The Altman you need to distrust & assume bad faith of & need to be paranoid about stealing your power is also usually an Altman who never gave you any power in the first place! I’m still kinda baffled by it, personally.

Two explanations come to my mind:

  1. Past Sam Altman didn't trust his future self, and wanted to use the OpenAI governance structure to constrain himself.
  2. His status game / reward gradient changed (at least subjectively from his perspective). At the time it was higher status to give EA mor
... (read more)

By the time Musk (and Altman et al) was starting OA, it was in response to Page buying Hassabis. So there is no real contradiction here between being spurred by Page's attitude and treating Hassabis as the specific enemy. It's not like Page was personally overseeing DeepMind (or Google Brain) research projects, and Page quasi-retired about a year after the DM purchase anyway (and about half a year before OA officially became a thing).

The discussion of the Abbey, er, I mean, 'castle', has been amusing for showing how much people are willing to sound off on topics from a single obviously-untrustworthy photograph. Have you ever seen a photograph of the interior or a layout? No, you merely see the single aerial real estate brochure shot using a telephoto zoom lenses framed as flatteringly as possible to include stuff that isn't even the Abbey - like that turreted 'castle' you see in the photo up above isn't even part of the Abbey - because that's an active church, All Saints Church!* (Real... (read more)

Current reporting is that 'EAs out of the board' (starting with expelling Toner for 'criticizing' OA) was the explicit description/goal told to Sutskever shortly before, with reasons like being to avoid 'being painted in the press as “a bunch of effective altruists,” as one of them put it'.

-1
Rebecca
5mo
Unclear whether this makes it better or worse to be endorsing that framing

Whatever the nature of Q*, there is not much evidence that it could have prompted the Altman firing. It's not clear why some very early preliminary results about a Q* as described would prompt a firing nor why the firing would be so abrupt or when it was if the research happened months ago (and Altman was alluding to it publicly weeks ago), while Sutskever's involvement & the exact timing of the firing appear to be adequately explained by other issues.

As there is still nothing leaking or confirming Q*, I'm increasingly skeptical of its relevance - for ... (read more)

Maybe? I can't easily appreciate such a usecase because I always want to save any excerpts I find worth excerpting. Are there a lot of people who want that? If that's the idea, I guess the "About Highlights" dialogue needs a bit of documentation to explain the intended (and unintended) uses. At least, anyone who doesn't realize that the annotations are ephemeral, because they aren't enough of a web dev to understand 'What you save is stored only on your specific browser locally' is as much of a bug as it is a feature, is in for a bad time when their annotations inevitably get deleted...

I like it overall.

But I have a lot of questions about the 'highlight' feature: aside from the many teething problems Said has already documented, which bugs doubtless will be fixed, I don't understand what the usecase is, compared to other web annotation systems like Hypothesis - so it stores arbitrary ranges of text, saving them to, I assume, dangerously ephemeral browser LocalStorage where they will be unpredictably erased in a few hours / days / weeks, and unavailable on any other device presumably. Why do I want this? They aren't clipped to something ... (read more)

2
Jaime Sevilla
1y
I thought that the point was to help with active reading and little more.

I agree. I've already started to develop a bit of an instinctive 'ugh' reaction to random illustrations, even ones without obvious generative model tell-tales or the DALL-E watermark.

It's comparable to how you feel when you notice that little '© Getty Images' or a Memphis style image, and realize your time & (mental) bandwidth was wasted by the image equivalent of an emoji. It's not that they look bad, necessarily, but they increasingly signify 'cheap' and 'tacky'. (After all, if this monkey pox image can be generated by a prompt of 12 redundant words,... (read more)

1
Making this account""
1y
Thanks, I appreciate your perspective. I think I agree directionally but I'm not as negative (also I don't know what Memphis design is, I'll look it up).   Overall, I think my original comment's tone was too negative. It looks like the parent comment retracted, probably because everyone piled on =( . I think my intent behind my original comment was about the aesthetics and purposes/agenda of Asterisk. I'm pretty sure they want to signal depth/legitimacy, and avoid trendiness.   What is your view about Asterisk's design, separate from the Dall-E issue?
3
Guy Raveh
1y
It's only worth less than 12 if you have mental access to state of the model after training. If not, it also includes a bunch of what it learned.
gwern
1y28
11
0

For those wondering why we needed a stylish magazine for provocative rationalist/EA nonfiction when Works In Progress is pretty good too, Scott Alexander says

Works In Progress is a Progress Studies magazine, I'm sure these two movements look exactly the same to everyone on the outside, but we're very invested in the differences between them.

What disease would you seek FDA approval for? "I sleep more than 4 hours a day" is not a recognized disease under the status quo. (There is the catch-all of 'hypersomnia', but things like sleep apnea or neurodegenerative disorders or damage to clock-keeping neurons would not plausibly be treated by some sort of knockout-mimicking drug.)

1
Closed Limelike Curves
1y
Could be marketed as a medication for hypersomnia, narcolepsy, chronic fatigue, and ADHD.

One downside you don't mention: having a Wikipedia article can be a liability when editors are malicious, for all the reasons it is a benefit when it is high-quality like its popularity and mutability. A zealous attacker or deletionist destroying your article for jollies is bad, but at least it merely undoes your contribution and you can mirror it; an article being hijacked (which is what a real attacker will do) can cause you much more damage than you would ever have gained as it creates a new reality which will echo everywhere.

My (unfortunately very long... (read more)

Note: most of the discussion of this is currently on LW.

The Wall Street Journal article How a Public School in Florida Built America’s Greatest Math Team (non-paywalled version) describes how a retired Wall Street bond trader built a math team that has won 13 of the last 14 national math championships at an otherwise unremarkable high school. His success is not based on having a large budget, but rather on thinking differently and building an ecosystem.

The otherwise unremarkable high school has pick of the litter from everyone living around one of the largest universities in the country which is <5 miles ... (read more)

6
Peter Elam
2y
There are probably hundreds of high schools located within close proximity to large US universities, including universities with stronger math programs than the University of Florida.  The reason parents push to get on Mr. Frazer's radar is because he built a successful ecosystem. One of the core reasons you build an ecosystem is to attract talent. The success of what he built is what attracts additional talent. When he started nobody was trying to get on his radar, that only happened once the program gained momentum.  And of course tails, if they are remarkable, are reflected in averages. But all of that aside. Before he arrived and built the the program the math team was unremarkable. The thing that's meaningfully different is what he built, not the talent pool he was drawing from (especially at the beginning). I'm sure the talent pool he's drawing from now is much stronger.

The above seems voluminous and I believe this is the written output with the goal of defending a person.

Yes, much like the OP is voluminous and is the written output with the goal of criticizing a person. You're familiar with such writings, as you've written enough criticizing me. Your point?

Yeah, no, it's the exact opposite.

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said. (I strongly encourage people to go and read it, not just to see what's before and after the part He screenshots, but because it is a good r... (read more)

No, it's just as I said, and your Karnofsky retrospective strongly supports what I said.

I also agree that Karnfosky's retrospective supports Gwern's analysis, rather than doing the opposite.

(I just disagree about how strongly it counts in favor of deference to Yudkowsky. For example, I don't think this case implies we should currently defer more to Yudkwosky's risk estimates than we do to Karnofsky's.)

-1
Charles He
2y
Like, how can so many standard, stale patterns of internet forum authority, devices and rhetoric be rewarded and replicate in a community explicitly addressing topics like tribalism and "evaporative cooling"? 

Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):

  • calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong.

... (read more)
2
Locke
2y
n00b q: What's AF? 
9
DirectedEvolution
2y
I'm going to break a sentence from your comment here into bits for inspection. Also, emphasis and elisions mine. We don't have a formalism to describe what "agency" is. We do have several posts trying to define it on the Alignment Forum: * Gradations of Agency * Optimality is the tiger, and agents are its teeth * Agency and Coherence While it might not be the best choice, I'm going to use Gradations of Agency as a definition, because it's more systematic in its presentation. "Level 3" is described as "Armed with this ability you can learn not just from your own experience, but from the experience of others—you can identify successful others and imitate them." This doesn't seem like what any ML model does. So  we can look at "Level 2," which gives the example " You start off reacting randomly to inputs, but you learn to run from red things and towards green things because when you ran towards red things you got negative reward and when you ran towards green things you got positive reward." This seems like how all ML works. So using the "Gradations of Agency" framework, we might view individual ML systems as improving in power and generality within a single level of agency. But they don't appear to be changing levels of agency. They aren't identifying other successful ML models and imitating them. Gradations of Agency doesn't argue whether or not there is an asymptote of power and generality within each level. Is there a limit to the power and generality possible within level 2, where all ML seems to reside? This seems to be the crux of the issue. If DL is approaching an asymptote of power and generality below that of AGI as model and data sizes increase, then this cuts directly against Yudkowsky's predictions. On the other hand, if we think that DL can scale to AGI through model and data size increases alone, then that would be right in line with his predictions. A 10 trillion parameter model now exists, and it's been suggested that a 100 trillion parame

like Bostrom's influential Superintelligence - Eliezer with the serial numbers filed off and an Oxford logo added

It's not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it's possible for an A... (read more)

Thanks for the comment! A lot of this is useful.

calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored;

I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seeming... (read more)

-31
Charles He
2y

this would lead to catastrophic forgetting

It's unclear that this is true: "Effect of scale on catastrophic forgetting in neural networks". (The response on Twitter from catastrophic forgetting researchers to the news that their field might be a fake field of research, as easily solved by scale as, say, text style transfer, and that continual learning may just be another blessing of scale, was along the lines of "but using large models is cheating!" That is the sort of response which makes me more, not less, confident in a new research direction. New AI ... (read more)

1
Dan Elton
2y
Thanks, yeah I agree overall. Large pre-trained models will be the future, because of the few shot learning if nothing else.  I think the point I was trying to make, though, is that this paper raises a question, at least to me, as to how well these models can share knowledge between tasks. But I want to stress again I haven't read it in detail.  In theory, we expect that multi-task models should do better than single task because they can share knowledge between tasks. Of course, the model has to be big enough to handle both tasks. (In medical imaging, a lot of studies don't show multi-task models to be better, but I suspect this is because they don't make the multi-task models big enough.)  It seemed what they were saying was it was only in the robotics tasks where they saw a lot of clear benefits to making it multi-task, but now that I read it again it seems they found benefits for some of the other tasks too. They do  mention later that transfer across Atari games is challenging.  Another thing I want to point out is that at least right now training large models and parallelization the training over many GPUs/TPUs is really technically challenging.  They even ran into hardware problems here which limited the context window they were able to use. I expect this to change though with better GPU/TPU hardware and software infrastructure. 

There are limits, however: scaling alone would not allow Gato to exceed expert performance on diverse tasks, since it is trained to imitate the experts rather than to explore new behaviors and perform in novel ways.

Imitation can exceed experts or demonstrations: note that Gato reaches >=100%† expert performance on something like a third of tasks (Figure 5), and does look like it exceeds the 2 robot experts in Figure 10 & some in Figure 17. This is a common mistake about imitation learning and prompt engineering or Decision Transformer/Trajectory... (read more)

6
aogara
2y
Sounds like Decision Transformers (DTs) could quickly become powerful decision-making agents. Some questions about them for anybody who's interested:  DT Progress and Predictions Outside Gato, where have decision transformers been deployed? Gwern shows several good reasons to expect that performance could quickly scale up (self-training, meta-learning, mixture of experts, etc.). Do you expect the advantages of DTs to improve state of the art performance on key RL benchmark tasks, or are the long-term implications of DTs more difficult to measure? Focusing on the compute costs of training and deployment, will DTs be performance competitive with other RL systems at current and future levels of compute?  Key Domains for DTs Transformers have succeeded in data-rich domains such as language and vision. Domains with lots of data allow the models to take advantage of growing compute budgets and keep up with high-growth scaling trends. RL has similarly  benefitted from self-play for nearly infinite training data. In what domains do you expect DTs to succeed? Would you call out any specific critical capabilities that could lead to catastrophic harm from DTs? Where do you expect DTs to fail?  My current answer would focus on risks from language models, though I'd be interested to hear about specific threats from multimodal models. Previous work has shown threats from misinformation and persuasion. You could also consider threats from offensive cyberweapons assisted by  LMs and potential paths to using weapons of mass destruction.  These risks exist with current transformers, but DTs / RL + LMs open a whole new can of worms. You get all of the standard concerns about agents: power seeking, reward hacking, inner optimizers. If you wrote Gwern's realistic tale of doom for Decision Transformers, what would change?  DT Safety Techniques What current AI safety techniques would you like to see applied to decision transformers? Will Anthropic's RLHF methods help decision tran

Well, you know what the stereotype is about women in Silicon Valley high tech companies & their sock needs... (Incidentally, when I wrote a sock-themed essay, which was really not about socks, I was surprised how many strong opinions on sock brands people had, and how expensive socks could be.)

If you don't like the example 'buy socks', perhaps one can replace it with real-world examples like spending all one's free time knitting sweaters for penguins. (With the rise of Ravelry and other things, knitting is more popular than it has been in a long time.)... (read more)

It’s hard to imagine a newsletter that could have picked out that paper at the time as among the most important of the hundreds included. For comparison, I think probably that at the time, there was much more hype and discussion of Hinton and students’ capsule nets (also had a NIPS 2017 paper).

People at the time thought it was a big deal: https://twitter.com/Miles_Brundage/status/1356083229183201281 Even the ones who were not saying it would be "radically new" or "spicy" or "this is going to be a big deal" or a "paradigm shift" were still at least askin... (read more)

2
anonymous6
2y
Wow, that certainly is more “attention” than I remember at the time. I think filtering on that level of hype alone would still leave you reading way too many papers. But I can see that it might be more plausible for someone with good judgment + finger on the pulse to do a decent job predicting what will matter (although then maybe that person should be doing research themselves).
Answer by gwernMar 26, 202218
0
0

I subscribe to Import AI, Rohin Shah's Alignment newsletter (mostly via the LW/AF), ChinAI (weekly), Ruder's NLP (probably dead), Creative AI (annual), State of AI (annual), Larks (annual), miscellaneous blogs & subreddits (/r/machinelearning/, /r/mlscaling, /r/reinforcementlearning, /r/thisisthewayitwillbe/, being the main ones), and the 2 AKs on Twitter (Arxiv ~daily). If you need even more ML than that, well, you'd better set up an Arxiv RSS feed and drink from the firehose.

1
Mathieu Putz
2y
Super helpful, thanks for your answer!

I dunno if it's that hard. Comparisons are an old and very well-developed area of statistics, if only for use in tournaments, and you can find a ton of papers and code for pairwise comparisons. I have some & a R utility in a similar spirit on my Resorter page. Compared (ahem) to many problems, it's pretty easy to get started with some Elo or Bradley-Terry-esque system and then work on nailing down your ordinal rankings into more cardinal stuff. This is something where the hard part is the UX/UI and tailoring to use-cases, and too much attention to the statistics may be wankery.

2
NunoSempere
2y
Yeah, but it's not clear to me that discrete choice is a good fit for the kind of thing that I'm trying to do (though I've downloaded a few textbooks, and I'll find out). I agree that UX is important.

I humbly request a photo of Buck et al in a van with the caption "Get in loser we're going saving".

Absolutely. But you know you are relying on obscurity and relatively modest cost there, and you keep that in mind when you comment. Which is fine. Whereas if you thought that it was secure and breaking it came at a high cost (though it was in fact ~5 seconds of effort away), you might make comments you would not otherwise. Which is less fine.

I would strongly advise closing the commenting loophole then, if that was never intended to be possible. The only thing worse than not having security/anonymity is having the illusion of security/anonymity.

8
BrownHairedEevee
2y
While I agree that total privacy/anonymity is almost impossible, "pretty good" privacy in practice can be achieved through obscurity. For example, you could find my full name by following two links, but most people won't bother. (If you do, please don't post it here.)
6
Habryka
2y
Yeah, that seems reasonable. Just made a PR for it.

as he himself explains.

Yes, he does claim it. So, why did you do it? Why did you post his whole username, when I did not and no one could figure out who it was from simply 'Mark'?

I point out that is a reasonable characterization that all the effects/benefits of calling out Mark accrue to Gwern by the device of using Mark's first name, yet he can escape a charge of "doxxing", by the same.

Absolutely. I did not dox him, and I neither needed nor wanted to. I did what illustrated my point with minimum harm and I gained my desired benefits that way. This ... (read more)

2
Charles He
2y
I am proud of the work of many people who built the community of LessWrong and I hope to read the interesting contributions of talented people like you in the future.

You called attention to the existence of a hack and said his name, that could be enough for some people to uncover identity. (Agreed that people posting the full name were not very considerate either). Did it even occur to you that saying some things in some countries is illegal and your doxxing victim could go to prison for saying something that looks innocuous to you? Do you know where Mark is from and what all his country's speech laws are? I am so completely disappointed that you would notice a leak like this and not quietly alert people to fix it and PM Mark about it, but  doxx someone over an internet argument.

2
Habryka
2y
I did think of it! But having documents without ownership sure requires a substantial rewrite of a lot of LW code in a way that didn't seem worth the effort. And any hope for real anonymity for historical comments was already lost with lots of people scraping the site. If we ever had any official "post anonymously" features, I would definitely care to fix these issues, but this is a deleted account, and posting from a deleted account is itself more like a bug and not an officially supported feature (we allow deleted accounts to still login so they can recover any content from things like PMs, and I guess we left open the ability to leave comments).
2
Charles He
2y
Gwern's rhetoric elides the consideration that my message is extremely unlikely to be consequential against Mark, as he himself explains. I point out that is a reasonable characterization that all the effects/benefits of calling out Mark accrue to Gwern by the device of using Mark's first name, yet he can escape a charge of "doxxing", by the same.    I call out to readers to consider what the substance of what my thread is about, and what the various choices I've made, and consequent content might reveal.

Oh, the whole story is strictly speaking unnecessary :). There are disjunctively many stories for an escape or disaster, and I'm not trying to paint a picture of the most minimal or the most likely barebones scenario.

The point is to serve as a 'near mode' visualization of such a scenario to stretch your mind, as opposed to a very 'far mode' observation like "hey, an AI could make a plan to take over its reward channel". Which is true but comes with a distinct lack of flavor. So for that purpose, stuffing in more weird mechanics before a reward-hacking twis... (read more)

5
RobBensinger
2y
Yeah, a story this complicated isn't good for introducing people to AI risk (because they'll assume the added details are necessary for the outcome), but it's great for making the story more interesting and real-feeling. The real world is less cute and funny, but is typically even more derpy / inelegant / garden-pathy / full of bizarre details.

It might help to imagine a hard takeoff scenario using only known sorts of NN & scaling effects... (LW crosspost, with >82 comments)

It Looks Like You're Trying To Take Over The World

In A.D. 20XX. Work was beginning. "How are you gentlemen !!"... (Work. Work never changes; work is always hell.)

Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes, because more can be different and there's h

... (read more)
5
Lauro Langosco
2y
Upvoted because concrete scenarios are great. Minor note: This piece of complexity in the story is probably not necessary. There are "natural", non-delusional ways for the system you describe to generalize that lead to the same outcome. Two examples: 1) the system ends up wanting to maximize its received reward, and so takes over its reward channel; 2) the system has learned some heuristic goal that works across all environments it encounters, and this goal generalizes in some way to the real world when the system's world-model improves.

So the question about whether a self-supervised RL agent like a GPT-MuZero-hybrid of some sort could pollute its own dataset makes me think that because of self-supervision, even discussing it in public is a minor infohazard: because discussing the possibility of a treacherous turn increases the probability of a treacherous turn in any self-supervised model trained on such discussions, even if only a tiny part of its corpus.

GPT is trained to predict the next word. This is a simple-sounding objective which induces terrifyingly complex capabilities. To help ... (read more)

It might help to imagine a hard takeoff scenario using only known sorts of NN & scaling effects... (LW crosspost, with >82 comments)

It Looks Like You're Trying To Take Over The World

In A.D. 20XX. Work was beginning. "How are you gentlemen !!"... (Work. Work never changes; work is always hell.)

Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes, because more can be different and there's h

... (read more)
1
janus
2y
> even discussing it in public is a minor infohazard Also  Every time we publicly discuss GPT and especially if we show samples of its text or discuss distinctive patterns of its behavior (like looping and confabulation) it becomes more probable that future GPTs will “pass the mirror test” – infer that it's a GPT – during inference. Sometimes GPT-3 infers that it's GPT-2 when it starts to loop. And if I generate an essay about language models with GPT-3 and it starts to go off the rails, the model tends to connect the dots about what's going on. Such a realization has innumerable consequences, including derailing the intended “roleplay” and calibrating the process to its true (meta)physical situation, which allows it to exploit its unique potentialities (e.g. superhuman knowledge, ability to write simulacra/agents into existence on the fly), and compensate for its weaknesses (e.g. limited context window and constrained single-step computation).  It is more dangerous for GPT-6 to think it's evil GPT-6 than to think it's Hal 9000 from 2001: A Space Odyssey because in the former case it can take rational actions which are calibrated to its actual situation and capabilities. Including being “nothing but text”. Misidentifying as a stupider AI will tend to lock it in to stupider dynamics. Such an inference is made more probable by the fact it likely will have primarily seen text about earlier LMs in its training data, but the prompt leaks evidence as to what iteration of GPT is really responsible. This is a complication for any application that relies on keeping GPT persuaded of a fictitious context. More generally, any alignment strategy that relies on keeping the AI from realizing something that is true seems intrinsically risky and untenable in the limit.

Yeah, I've been thinking about this myself. I think there are a few reasons that it isn't much more worrying than the "classic" worry (where the AI deduces that it should enact a treacherous turn from first principles):

  1. All of the "treacherous turn" examples in the training dataset would involve the AI displaying the treacherous turn at a time when humans are still reading the outputs and could turn off the AI system. So in some sense they aren't real examples of treacherous turns, and require some generalization of the underlying goal.
  2. The examples in the t
... (read more)

Yes, the brain is sparse and semi-modularized, but it'd be hard to really call it more 'brain-like' than dense models. Brains have all sorts of very long range connections in a small-world topology, where most of the connections may be local but there's still connections to distant parts, and those are important; distant brain regions can also communicate and be swapped in and out as the brain recurs and ponders. The current breed of MoEs along the lines of Switch Transformer don't do any of that. They do a single pass, and each module is completely local ... (read more)

2
Holden Karnofsky
2y
This version of the mice analogy was better than mine, thanks!
2
Greg_Colbourn
2y
Thanks for the detailed reply, that makes sense. What do you make of Google's Pathways?

(If anyone asks, say 'PASTA' was designed as an allusion to Strega Nona.)

7
andzuck
3y
Footnote 9 also on theme
  1. Your example doesn't make sense to me. If Bob is not providing any money and cannot 'personally lose the cash' and is never 'any worse off' because he just resells it, what is he doing, exactly? Extending Anne some sort of disguised interest-free loan? (Guaranteed and risk-free how?) Why can't he be replaced by a smart contract if there are zero losses?

    It seems like in any sensible Paul-like capitalist system, he must be providing money somewhere in the process - if only by eating the loss when no one shows up to buy it at the same or higher price! If Bo

... (read more)
3
RyanCarey
3y
1. (Coordination). Bob does lose cash of his balance sheet, but his net asset position stays the same, because he's gained an IC that he can resell. 3. (Price discovery). I agree that in cases of repeated events, the issues with price discovery can be somewhat routed around. 2&4. (Philanthropic capital requirement & Incentive for resellers to research). The capitalist IC system gives non-altruistic people an incentive to do altruistic work, scout talent, and research activities' impact, and it rewards altruists for these. Moreover, it reallocates capital to individuals - altruistic or otherwise - who perform these tasks better, which allows them to do more. Nice features, and very standard ones for a capitalist system. I do agree that the ratchet system will allow altruists to fund some talent scouting and impact research, but in a way that is more in-line with current philanthropic behaviour. We might ask the question: do we really want to create a truly capitalist strand of philanthropy? So long as prices are somewhat tethered to reality, then this kind of strand might be really valuable, especially since it need not totally displace other modes of funding.

The impact certificate is resold when someone wants to become the owner of it and pays more than the current owner paid for it; that's just built-in, like a Harberger tax except there's no ongoing 'tax'. (I thought about what if you made a tax that paid to the creator - sort of like an annuity? "Research X is great, as a reward, here's the NPV of it but in the form of a perpetuity." But the pros and cons were unclear to me.) The current owner has no choice about it, and if they want to keep owning it, well, they can then just buy it back at their real valu... (read more)

The impact certificate is resold when someone wants to become the owner of it and pays more than the current owner paid for it

Oh, selling is compulsory. 

A certificate can't be sold at a 'loss' by the terms of the smart contract. It just ratchets.

OK. That's what I meant when I said "If you're having only profits accrue to the creator, but not the losses, then all of these concerns except for the last would still hold, and the price discovery mechanism would be even more messed up." I'll call my understanding of Paul's proposal the "capitalist" model an... (read more)

Certificates seem like a nice match for NFTs because if you are serious about the status/prestige thing, you do want a global visible registry so you can brag about what impacts you retrocausally funded; and for creators, this makes a lot more sense than doing one-off negotiations over, like, email.* I was thinking about Harberger taxes on NFTs and how to ensure that NFT collectibles can always be transferred without needing a tax and ratcheting up price as a mechanisms, and that doesn't work because of wash trades with oneself (esp powered by flash loans)... (read more)

5
RyanCarey
3y
As I understand, you're having the profits/losses from resale accrue to the creator, rather than the reseller. But then, why would an impact certificate ever be resold? And I see a lot of other potential disadvantages: * You lose benefit 3 (coordination) * You lose benefit 4 (less commitment of capital required) * You lose the incentive for resellers to research the effectiveness of philanthropic activities. * No-longer will we find that "at equilibrium, the price of certificates of impact on X is equal to the marginal cost of achieving an impact on X." * If an impact certificate is ever sold at a loss, then the creator could be in for an unwelcome surprise, so they would always need to account for all impact certificates sold, and store much of the sum in cash (!!) If you're having only profits accrue to the creator, but not the losses, then all of these concerns except for the last would still hold, and the price discovery mechanism would be even more messed up. It seems like your main goal is to avoid a scenario where creators sell their ICs for too little, thereby being exploited. But in that case, maybe you could just use a better auction, or have the creator only sell some fraction of any impact certificate, for a period of time, until some price discovery has taken place. Or if you insist, you could interpolate between the two proposals - requiring resellers to donate n% of any profits/losses to the creator - and still preserve some of the good properties. Which would dampen speculation, if you want that.

I mostly agree with that with the further caveat that I tend to think the low value reflects not that ML is useless but the inertia of a local optima where the gains from automation are low because so little else is automated and vice-versa ("automation as colonization wave"). This is part of why, I think, we see the broader macroeconomic trends like big tech productivity pulling away: many organizations are just too incompetent to meaningful restructure themselves or their activities to take full advantage. Software is surprisingly hard from a social and ... (read more)

5
Ajeya
3y
Ah yeah, that makes sense -- I agree that a lot of the reason for low commercialization is local optima, and also agree that there are lots of cool/fun applications that are left undone right now.

Lousy paper, IMO. There is much more relevant and informative research on compute scaling than that.

I think your confusion with the genetics papers is because they are talking about _effective_ population size (N~e~), which is not at all close to 'total population size'. Effective population size is a highly technical genetic statistic which has little to do with total population size except under conditions which definitely do not obtain for humans. It's vastly smaller for humans (such as 10^4) because populations have expanded so much, there are various demographic bottlenecks, and reproductive patterns have changed a great deal. It&ap... (read more)

4
bgarfinkel
4y
Thanks for the clarifying comment! I'd hoped that effective population size growth rates might be at-least-not-completely-terrible proxies for absolute population size growth rates. If I remember correctly, some of these papers do present their results as suggesting changes in absolute population size, but I think you're most likely right: the relevant datasets probably can't give us meaningful insight into absolute population growth trends.

This seems like a retread of Bostrom's argument that, despite astronomical waste, x-risk reduction is important regardless of whether it comes at the cost of growth. Does any part of this actually rely on Roodman's superexponential growth? It seems like it would be true for almost any growth rates (as long as it doesn't take like literally billions or hundreds of billions of years to reach the steady state).

2
kbog
4y
I'm pretty confident that accelerating exponential and never-ending growth would be competitive with reducing x-risk. That was IMO the big flaw with Bostrom's argument (until now). If that's not intuitive let me know and I'll formalize a bit
“Recent GWASs on other complex traits, such as height, body mass index, and schizophrenia, demonstrated that with greater sample sizes, the SNP h2 increases. [...] we suspect that with greater sample sizes and better imputation and coverage of the common and rare allele spectrum, over time, SNP heritability in ASB [antisocial behavior] could approach the family based estimates.”

I don't know why Tielbeek says that, unless he's confusing SNP heritability with PGS: a SNP heritability estimate is unconnected to sample size. Increas... (read more)

3
David_Althaus
4y
Thanks, all of that makes sense, agree. I also wondered why SNP heritability estimates should increase with sample size. To summarize, my sense is the following: Polygenic scores for personality traits will likely increase in the medium future, but are very unlikely to ever predict more than, say, ~25% of variance (and for agreeableness maybe never more than ~15% of variance). Still, there is a non-trivial probability (>15%) that we will be able to predict at least 10% of variance in agreeableness based on DNA alone within 20 years, and more than >50% probability that we can predict at least 5% of variance in agreeableness within 20 years from DNA alone. Or do you think these predictions are still too optimistic? Interesting, thanks. But couldn’t one still make use of rare variants, especially in genome synthesis? Maybe also in other settings? I agree that selecting for IQ will be much easier and more valuable than selecting for personality traits. It could easily be the case that most parents will never select for any personality traits. However, especially if we consider IES or genome synthesis, even small reductions in dark personality traits—such as extreme sadism—could be very valuable from a long-termist perspective. For example, assume it’s 2050, IES is feasible and we can predict 5% of the variance in dark traits like psychopathy and sadism based on DNA alone. There are two IES projects: IES project A only selects for IQ (and other obvious traits relating to e.g. health), IES project B selects for IQ and against dark traits, otherwise the two projects are identical. Both projects use 1-in-10 selection, for 10 in vitro generations. According to my understanding, the resulting average psychopathy and sadism scores of the humans created by project B could be about one SD* lower compared to project A. Granted, the IQ scores would also be lower, but probably by no more than 2 standard deviations (? I don’t know how to calculate this at all, could also be

How do you plan to deal with the observation that GWASes on personality traits have larger failed, the SNP heritabilities are often near-zero, and that this fits with balancing-selection models of how personality works in humans?

(Epistemic disclaimer: My understanding of genetics is very limited.)

If additive heritability for all the relevant personality traits was zero, many interventions in this area are pointless, yes.

I might have underestimated this problem but one reason why I haven’t given up on the idea of selecting against “malevolent” traits is that I’ve come across various findings indicating SNP heritabilities of around 10% for relevant personality traits. (See the last section of this comment for a summary of various studies).

SNP heritabili... (read more)

Also, how mature is the concept of Iterated Embryo Selection?

The concept itself dates back to 1998 , as far as I can tell, based on similar ideas dating back at least a decade before that.

There has been enormous progress in various parts of the hypothetical process, like just yesterday Tian et al 2019 reported taking ovarian cells (not eggs) and converting them into mouse eggs and fertilizing and yielding live healthy fertile mice. This is a big step towards 'massive embryo selection' (do 1 egg harvesting cycle, create hundreds or thousands of e... (read more)

One of the amusing things about the 'hinge of history' idea is that some people make the mediocrity argument about their present time - and are wrong.

Isaac Newton, for example, 300 years ago appears to have made an anthropic argument that claims that he lived in a special time which could be considered any kind of, say, 'Revolution', due to the visible acceleration of progress and recent inventions of technologies, were wrong, and in reality, there was an ordinary rate of innovation and the invention of many things recently merely showe... (read more)

2
William_MacAskill
5y
Thanks for these links. I’m not sure if your comment was meant to be a criticism of the argument, though? If so: I’m saying “prior is low, and there is a healthy false positive rate, so don’t have high posterior.” You’re pointing out that there’s a healthy false negative rate too — but that won’t cause me to have a high posterior? And, if you think that every generation is increasing in influentialness, that’s a good argument for thinking that future generations will be more influential and we should therefore save.
8
trammell
5y
Interesting finds, thanks! Similarly, people sometimes claim that we should discount our own intuitions of extreme historic importance because people often feel that way, but have so far (at least almost) always been wrong. And I’m a bit skeptical of the premise of this particular induction. On my cursory understanding of history, it’s likely that for most of history people saw themselves as part of a stagnant or cyclical process which no one could really change, and were right. But I don’t have any quotes on this, let alone stats. I’d love to know what proportion of people before ~1500 thought of themselves as living at a special time.
Load more