Flodorner

Wiki Contributions

Comments

UK's new 10-year "National AI Strategy," released today

Huh? I did not like the double-page style for the non-mobile pdf, as it required some manual rescaling on my PC.

And the mobile version has the main table cut between two pages in a pretty horrible way. I think I would have much preferred a single pdf in the mobile/single page style that is actually optimized for that style, rather than this.

Maybe I should have used the HTML version instead?

UK's new 10-year "National AI Strategy," released today

More detailed action points on safety from page 32: 

The Office for AI will coordinate cross-government processes to accurately assess long term AI safety and risks, which will include activities such as evaluating technical expertise in government and the value of research infrastructure. Given the speed at which AI developments are impacting our world, it is also critical that the government takes a more precise and timely approach to monitoring progress on AI, and the government will work to do so. 


The government will support the safe and ethical development of these technologies as well as using powers through the National Security & Investment Act to mitigate risks  arising from a small number of potentially concerning actors. At a strategic level, the National Resilience Strategy will review our approach to emerging technologies; the Ministry of Defence will set out the details of the approaches by which Defence AI is developed and used; the National AI R&I Programme’s emphasis on AI theory will support safety; and central government will work with the national security apparatus to consider narrow and more general AI as a top-level security issue.

When pooling forecasts, use the geometric mean of odds

I don't think I get your argument for why the approximation should not depend on the downstream task. Could you elaborate? 

I am also a bit confused about the relationship between spread and resiliency: a larger spread of forecasts does not seem to necessarily imply weaker evidence: It seems like for a relatively rare event about which some forecasters could acquire insider information, a large spread might give you stronger evidence. 

Imagine  is about the future enactment of a quite unusual government policy, and one of your forecasters is a high ranking government official. Then, if all of your forecasters are relatively well calibrated and have sufficient incentive to report their true beliefs,  a 90% forecast for  by the government official and a 1% forecast by everyone else should likely shift your beliefs a lot more towards  than a 10% forecast by everyone.   

 

When pooling forecasts, use the geometric mean of odds

This seems to connect to the concept of - means: If the utility for an option is proportional to , then the expected utility of your mixture model is equal to the expected utility using the -mean of the expert's probabilities  and  defined as , as the  in the utility calculation cancels out the .  If I recall correctly, all aggregation functions that fulfill some technical conditions on a generalized mean can be written as a -mean.  

In the first example,   is just linear, such that the -mean is the arithmetic mean. In the second example,   is equal to the expected lifespan of  which yields the harmonic mean. As such, the geometric mean would correspond to the mixture model if and only if utility was logarithmic in , as  the geometric mean is the -mean corresponding to the logarithm.  

For a binary event with "true" probability , the expected log-score for a forecast of  is , which equals  for  . So the geometric mean of odds would optimize yield the correct utility for the log-score according to the mixture model, if all the events we forecast were essentially coin tosses (which seems like a less satisfying synthesis than I hoped for).

Further questions that might be interesting to analyze from this point of view:

  • Is there some kind of approximate connection between the Brier score and the geometric mean of odds that could explain the empirical performance of the geometric mean on the Brier score? (There might very well not be anything, as the mixture model might not be the best way to think about aggregation).
  • What  optimization target (under the mixture model) does extremization correspond to? Edit: As extremization is applied after the aggregation, it cannot be interpreted  in terms of mixture models (if all forecasters give the same prediction, any -mean has to have that value, but extremization yields a more extreme prediction.)

Note: After writing this, I noticed that UnexpectedValue's comment on the top-level post essentially points to the same concept. I decided to still post this, as it seems more accessible than their technical paper while (probably) capturing the key insight.

Edit: Replaced "optimize" by "yield the correct utility for" in the third paragraph. 

More undergraduate or just-graduated students should consider getting jobs as research techs in academic labs

I wanted to flag that many PhD programs in Europe might require you to have a Master's degree, or to essentially complete the coursework for Master's degree during your PhD (as seems to be the case in the US),  depending on the kind of undergraduate degree you hold. Obviously, the arguments regarding funding might still partially hold in that case. 

What are the EA movement's most notable accomplishments?

Do you have a specific definition of AI Safety in mind? From my (biased) point of view, it looks like large fractions of work that is explicitly branded "AI Safety" is done by people who are at least somewhat adjacent to the EA community. But this becomes a lot less true if you widen the definition to include all work that could be called "AI Safety" (so anything that could conceivably help with avoiding any kind of dangerous malfunction of AI systems, including small scale and easily fixable problems).

AMA: The new Open Philanthropy Technology Policy Fellowship

Relatedly, what is the likelihood that future iterations of the fellowship might be less US-centric, or include Visa sponsorship?

Apply to the new Open Philanthropy Technology Policy Fellowship!

The job posting states: 

"All participants must be eligible to work in the United States and willing to live in Washington, DC, for the duration of their fellowship. We are not able to sponsor US employment visas for participants; US permanent residents (green card holders) are eligible to apply, but fellows who are not US citizens may be ineligible for placements that require a security clearance."

So my impression would be that it would be pretty difficult to participate for non-US citizens who do not already live in the US. 

What previous work has been done on factors that affect the pace of technological development?

https://en.wikipedia.org/wiki/Technological_transitions might be relevant.

The Geels book cited in the article (Geels, F.W., 2005. Technological transitions and system innovations. Cheltenham: Edward Elgar Publishing.) has a bunch of interesting case studies I read a while ago and a (I think popular) framework for technological change, but I am not sure the framework is sufficiently precise to be very predictive (and thus empirically validatable). 

I don't have any particular sources on this, but the economic literature on the effects of regulation might be quite relevant. In particular, I do remember attending a lecture arguing that limited liability played an important role for innovation during the industrial revolution.

Is there evidence that recommender systems are changing users' preferences?

Facebook has at least experimented with using deep reinforcement learning to adjust its notifications according to https://arxiv.org/pdf/1811.00260.pdf . Depending on which exact features they used for the state space (i.e. if they are causally connected to preferences), the trained agent would at least theoretically have an incentive to change user's preferences. 

The fact that they use DQN rather than a bandit algorithm seems to suggest that what they are doing involves at least some short term planning, but the paper does not seem to analyze the experiments in much detail, so it is unclear whether they could have used a myopic bandit algorithm instead. Either way, seeing this made me update quite a bit towards being more concerned about the effect of recommender systems on preferences. 

Load More