I can see how this gets you for each each item , but not . One of the advantages Ozzie raises is the possibility to keep track of correlations in value estimates, which requires more than the marginal expectations.
So constructing a value ratio table means estimating a joint distribution of values from a subset of pairwise comparisons, then sampling from the distribution to fill out the table?
In that case, I think estimating the distribution is the hard part. Your example is straightforward because it features independent estimates, or simple functional relationships.
The only piece of literature I had in mind was von Neumann and Morgenstern’s representation theorem. It says: if you have a set of probability distributions over a set of outcomes and for each pair of distributions you have a preference (one is better than the other, or they are equal) and if this relation satisfies the additional requirements of transitivity, continuity and independence from alternatives, then you can represent the preferences with a utility function unique up to affine transformation.
Given that this is a foundational result for expected ...
Because we are more likely to see no big changes than to see another big change.
if the risk is usually quite low (e.g. 0.001 % per century), but sometimes jumps to a high value (e.g. 1 % per century), the cumulative risk (over all time) may still be significantly below 100 % (e.g. 90 %) if the magnitude of the jumps decreases quickly, and risk does not stay high for long.
I would call this model “transient deviation” rather than “random walk” or “regular oscillation”
We can still get H4 if the amplitude of the oscillation or random walk decreases over time, right?
The average needs to fall, not the amplitude. If we're looking at risk in percentage points (rather than, say, logits, which might be a better parametrisation), small average implies small amplitude, but small amplitude does not imply small average.
Only if the sudden change has a sufficiently large magnitude, right?
The large magnitude is an observation - we have seen risk go from quite low to quite high over a short period of time. If we expect such large magnitude changes to be rare, then we might expect the present conditions to persist.
FWIW I think the general kind of model underlying what I’ve written is a joint distribution that models value something like
Thought about this some more. This isn't a summary of your work, it's an attempt to understand it in my terms. Here's how I see it right now: we can use pairwise comparisons of outcomes to elicit preferences, and people often do, but they typically choose to insist that each outcome has a value representable as a single number and use the pairwise comparisons to decide which number to assign each outcome. Insisting that each outcome has a value is a constraint on preferences that can allow us to compute which outcome is preferred between two outcomes for w...
I don't think it's all you are doing, that's why I wrote the rest of my comment (sorry to be flippant).
The point of bringing up binary comparisons is that a table of binary comparisons is a more general representation than a single utility function.
If all we are doing is binary comparisons between a set of items, it seems to me that it would be sufficient to represent relative values as a binary - i.e., is item1 better, or item2? Or perhaps you want a ternary function - you could also say they're equal.
Using a ratio instead of a binary indicator for relative values suggests that you want to use the function to extrapolate. I'm not sure that this approach helps much with that, though. For example,
...costOfp001DeathChance = ss(10 to 10k) // Cost of a 0.001% chance of death, in dol
AFAIK the official MIRI solution to AI risk is to win the race to AGI but do it aligned.
Part of the MIRI theory is that winning the AGI race will give you the power to stop anyone else from building AGI. If you believe that, then it’s easy to believe that there is a race, and that you sure don’t want to lose.
It cannot both be controllable because it’s weak and also uncontrollabile.
That said, I expect more advanced techniques will be needed for more advanced AI; I just think control techniques probably keep up without sudden changes in control requirements.
Also LLMs are more controllable than weaker older designs (compare GPT4 vs Tay).
I’d love to hear from people who don’t “have adhd”. I have a diagnosis myself but I have trouble believing I’m all that unusual. I tried medication for a while, but I didn’t find it that helpful with regard to the bottom line outcome of getting things done, and I felt uncomfortable with the idea of taking stimulants regularly for many years. I’d certainly benefit from being more able to finish projects, though!
People will continue to prefer controllable to uncontrollable AI and continue to make at least a commonsense level of investment in controllability; that is, they invest as much as naively warranted by recent experience and short term expectations, which is less than warranted by a sophisticated assessment of uncertainty about misalignment, though the two may converge as “recent experience” involves more and more capable AIs. I think this minimal level of investment in control is very likely (99%+).
Next, the proposed sudden/surprising phase transitio...
I'm writing quickly because I think this is a tricky issue and I'm trying not to spend too long on it. If I don't make sense, I might have misspoken or made a reasoning error.
One way I thought about the problem (quite different to yours, very rough): variation in existential risk rate depends mostly on technology. At a wide enough interval (say, 100 years of tech development at current rates), change in existential risk with change in technology is hard to predict, though following Aschenbrenner and Xu's observations it's plausible that it tends to some eq...
I don't see this. First, David's claim is that a short time of perils with low risk thereafter seems unlikely - which is only a fraction of hypothesis 4, so I can easily see how you could get H3+H4_bad:H4_good >> 10:1
I don't even see why it's so implausible that H3 is strongly preferred to H4. There are many hypotheses we could make about time varying risk:
- Monotonic trend (many varieties)
- Oscillation (many varieties)
- Random walk (many varieties)
- ...
If we aren't trying to carefully consider technological change (and ignori...
Fair overall. I talked to some other people, and I think I missed the oscillation model when writing my original comment, which in retrospect is a pretty large mistake. I still don't think you can buy that many 9s on priors alone, but sure, if I think about it more maybe you can buy 1-3 9s. :/
First, David's claim is that a short time of perils with low risk thereafter seems unlikely.
Suppose you were put to cryogenic sleep. You wake up in the 41st century. Before learning anything about this new world, is your prior really[1] that the 41st centur...
When I read your scripts and Rob is interviewing, I like to read Rob’s questions at twice the speed of the interviewees’ responses. Can you accommodate that with your audio version?
Thanks for the suggestion David! We're discussing adding this as a premium feature — perhaps activated only for Giving What We Can members.
I have children, and I would precommit to enduring the pain without hesitation, but I don’t know what I would do in the middle of experiencing the pain. If pain is sufficiently intense, “I” am not in chatter any more, and whatever part of me is in charge, I don’t know very well how it would act
I have the complete opposite intuition: equal levels of pain are harder to endure for equal time if you have the option to make them stop. Obviously I don’t disagree that pain for a long time is worse than pain for a short time.
This intuition is driven by experiences like: the same level of exercise fatigue is a lot easier to endure if giving up would cause me to lose face. In general, exercise fatigue is more distracting than pain from injuries (my reference points being a broken finger and a cup of boiling water in my crotch - the latter being about as d...
Conditional on AGI being developed by 2070, what is the probability that humanity will suffer an existential catastrophe due to loss of control over an AGI system?
Requesting a few clarifications:
I think journalists are often imprecise and I wouldn't read too much into the particular synonym of "said" that was chosen.
Does it make more sense to think about all probability distributions that offers a probability of 50% for rain tomorrow? If we say this represents our epistemic state, then we're saying something like "the probability of rain tomorrow is 50%, and we withhold judgement about rain on any other day".
I think this question - whether it's better to take 1/n probabilities (or maximum entropy distributions or whatever) or to adopt some "deep uncertainty" strategy - does not have an obvious answer
Perhaps I’m just unclear what it would even mean to be in a situation where you “can’t” put a probability estimate on things that does as good as or better than pure 1/n ignorance.
Suppose you think you might come up with new hypotheses in the future which will cause you to reevaluate how the existing evidence supports your current hypotheses. In this case probabilistically modelling the phenomenon doesn’t necessarily get you the right “value of further investigation” (because you’re not modelling hypothesis X), but you might still be well advised to hol...
Fair enough, she mentioned Yudkowsky before making this claim and I had him in mind when evaluating it (incidentally, I wouldn't mind picking a better name for the group of people who do a lot of advocacy about AI X-risk if you have any suggestions)
I skimmed from 37:00 to the end. It wasn't anything groundbreaking. There was one incorrect claim ("AI safteyists encourage work at AGI companies"), I think her apparent moral framework that puts disproportionate weight on negative impacts on marginalised groups is not good, and overall she comes across as someone who has just begun thinking about AGI x-risk and so seems a bit naive on some issues. However, "bad on purpose to make you click" is very unfair.
But also: she says that hyping AGI encourages races to build AGI. I think this is true! Large languag...
I think it's quite sensible that people hoping to have a positive impact in biosecurity should become well-informed first. However, I don't think this necessarily means that radical positions that would ban a lot of research are necessarily wrong, even if they are more often supported by people with less detailed knowledge of the field. I'm not accusing you of saying this, I just want to separate the two issues.
...Many professionals in this space are scared and stressed. Adding to that isn’t necessarily building trust and needed allies. The professional
I do worry about it. Some additional worries I have are 1) if AI is transformative and confers strong first mover advantages, then a private company leading the AGI race could quickly become similarly powerful to a totalitarian government and 2) if the owners of AI depend far less on support from people for their power than today’s powerful organisations, they might be generally less benevolent than today’s powerful organisations
I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.
The main point I took from video was that Abigail is kinda asking the question: "How can a movement that wants to change the world be so apolitical?" This is also a criticism I have of many EA structures and people.
I think it's surprising that EA is so apolitical, but I'm not convinced it's wrong to make some effort to avoid issues that are politically hot. Three reasons to avoid such things: 1) they're often not the areas where the most impact can be had, even ignoring constraints imposed by them being hot political topics 2) being hot political topics ma...
Is the reason you don’t go back and forth about whether ELK will work in the narrow sense Paul is aiming for a) you’re seeking areas of disagreement, and you both agree it is difficult or b) you both agree it is likely to work in that sense?
My intuition for why "actions that have effects in the real world" might promote deception is that maybe the "no causation without manipulation" idea is roughly correct. In this case, a self-supervised learner won't develop the right kind of model of its training process, but the fine-tuned learner might.
I think "no causation without manipulation" must be substantially wrong. If it was entirely correct, I think one would have to say that pretraining ought not to help achieve high performance on a standard RLHF objective, which is obviously false. It still ...
I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions the...
If a model is deceptively aligned after fine-tuning, it seems most likely to me that it's because it was deceptively aligned during pre-training.
How common do you think this view is? My impression is that most AI safety researchers think the opposite, and I’d like to know if that’s wrong.
I’m agnostic; pretraining usually involves a lot more training, but also fine tuning might involve more optimisation towards “take actions with effects in the real world”.
All of these comments are focused on my third core argument. What do you think of the other two? They all need to be wrong for deceptive alignment to be a likely outcome.
Yeah, this is just partial feedback for now.
Recall that in this scenario, the model is not situationally aware yet, so it can't be deceptive. Why would making the goal long-term increase immediate-term reward? If the model is trying to maximize immediate reward, making the goal longer-term would create a competing priority.
I think I don't accept your initial premise. Maybe a mod...
Gradient descent can only update the model in the direction that improves performance hyper-locally. Therefore, building the effects of future gradient updates into the decision making of the current model would have to be advantageous on the current training batch for it to emerge from gradient descent.
I think the standard argument here would be that you've got the causality slightly wrong. In particular: pursuing long term goals is, by hypothesis, beneficial for immediate-term reward, but pursuing long term goals also entails considering the effects of f...
I think your title might be causing some unnecessary consternation. "You don't need to maximise utility to avoid domination" or something like that might have avoided a bit of confusion.
and I would urge the author to create an actual concrete situation that doesn't seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences
I'd be surprised if you couldn't come up with situations where completeness isn't worth the cost - e.g. something like, to close some preference gaps you'd have to think for 100x as long, but if you close them all arbitrarily then you end up with intrasitivity.
I wonder if it is possible to derive expected utility maximisation type results from assumptions of "fitness" (as in, evolutionary fitness). This seems more relevant to the AI safety agenda - after all, we care about which kinds of AI are successful, not whether they can be said to be "rational". It might also be a pathway to the kind of result AI safety people implicitly use - not that agents maximise some expected utility, but that they maximise utilities which force a good deal of instrumental convergence (i.e. describing them as expected utility ...
...Fixing the “I pit my evidence against itself” problem is easy enough once I’ve recognized that I’m doing this (or so my visualizer suggests); the tricky part is recognizing that I’m doing it.
One obvious exercise for me to do here is to mull on the difference between uncertainty that feels like it comes from lack of knowledge, and uncertainty that feels like it comes from tension/conflict in the evidence. I think there’s a subjective difference, that I just missed in this case, and that I can perhaps become much better at detecting, in the wake of this hars
That being said, polyamory/kink is very often used as a tool of social pressure by predators to force women into a bad choice of either a situation they would not have otherwise agreed to or being called “close minded” and potentially withheld social/career opportunities.
Are such threats believable? Is there a broader culture where people feel that they’re constantly under evaluation such that personal decisions like this are plausibly taken into account for some career opportunities, or is this something that arises mainly where the career opportunities are within someone’s personal fiefdom?
What you're saying here resonates with me, but I wonder if there are people who might be more inclined to assume they're missing something and consequently have a different feeling about what's going on when they're in the situation you're trying to describe. In particular, I'm thinking about people prone to imposter syndrome. I don't know what their feeling in this situation would be - I'm not prone to imposter syndrome - but I think it might be different.
I would have thought that "all conjectures" is a pretty natural reference class for this problem, and Laplace is typically used when we don't have such prior information - though if the resolution rate diverges substantially from the Laplace rule prediction I think it would still be interesting.
I think, because we expect the resolution rate of different conjectures to be correlated, this experiment is a bit like a single draw from a distribution over annual resolution probabilities rather than many draws from such a distribution ( if you can forgive a little frequentism).
I think to properly model Ord’s risk estimates, you have to account for the fact that they incorporate uncertainty over the transition rate. Otherwise I think you’ll overestimate the rate at which risk compounds over time, conditional on no catastrophe so far.
I think Gary Marcus seems to play the role of an “anti-AI-doom” figurehead much more than Timnit Gebru. I don’t even know what his views on doom are, but he has established himself as a prominent critic of “AI is improving fast” views and seemingly gets lots of engagement from the safety community.
I also think Marcus’ criticisms aren’t very compelling, and so the discourse they generate isn’t terribly valuable. I think similarly of Gebru’s criticism (I think it’s worse than Marcus’, actually), but I just don’t think it has as much impact on the safety community.
Some quick thoughts: A crude version of the vulnerable world hypothesis is “developing new technology is existentially dangerous, full stop”, in which case advanced AI that increase the rate of new technology development is existentially dangerous, full stop.
One of Bostroms solutions is totalitarianism. This seems to imply something like “new technology is dangerous, but this might be offset by reducing freedom proportionally”. Accepting this hypothesis seems to say that either advanced AI is existentially dangerous, or it accelerates a political transition to totalitarianism, which seems to be its own kind of risk.
What sort of substantial value would you expect to be added? It sounds like we either have a different belief about the value-add, or a different belief about the costs.
I'd be very surprised if the actual amount of big-picture strategic thinking at either organisation was "very little". I'd be less surprised if they didn't have a consensus view about big-picture strategy, or a clearly written document spelling it out. If I'm right, I think the current content is misleading-ish. If I'm wrong and actually little thinking has been done - there's some chance t...
I would take the proposal to be AI->growth->climate change or other negative growth side effects