Dan Elton

109Joined Oct 2021


AMA: Ought

There are and have been a lot startups working on similar things (AI to assist researchers), going back to IBM's ill-fated Watson.  Your demo makes it look very useful and is definitely the most impressive I've seen. I'm deeply suspicious of demos, however. 

How can you test if your system is actually useful for researchers?

[One (albeit imperfect) way to gauge utility is to see if people are willing to pay money for it and keep paying money for it over time. However, I assume that is not the plan here. I guess another thing would be to track how much people use it over time or see if they fall away from using it. Another of course would be an RCT, although it's not clear how it would be structured.]

Announcing Future Forum - Apply Now

There's also a retreat being run for ACX and rationality meetup organizers (https://www.rationalitymeetups.org/new-page-4) July 21 - 24, and a lot of pre-events and after parties planned for EAG SF. (I can send people  GDocs with lists if anyone is interested.. I'm not sure if the organizers want them shared publicly). 

Announcing Future Forum - Apply Now

There should be other options for 2FA you can setup going forward, like authenticator or using the Gmail app on your phone. Some cell providers allow calls / SMS over wifi now, too. There's also might be a way to use backup codes. 

Stuff I buy and use: a listicle to boost your consumer surplus and productivity

I use all of these, except for corn bulbs! (although technically I used QC35s instead of the QC 45.. and for fish oil I use a 2:1 or 3:1 ratio capsulized)

You didn't mention the M1 chip in the 2021+ Mac Book Pros, which is amazing. 

Also worth noting I found the Bose QC35/45 are better than the NC700 (for a variety of reasons - the volume can go lower and the switches are more tactile being the two main ones). 


A laundry list of anxieties about launching my blog: any feedback appreciated!

Just realized I never posted this comment that I wrote a ~week ago!: 

What you're saying is not uncommon. People on EA Forum are very smart and love to engage in criticism, no doubt. Sometimes, it comes off as harsh, too. It is a bit intimidating. 

Overall I like getting criticism on my blog posts although it can be tough seeing you made a stupid error and now someone is upset at you. If you're worried about loosing standing in the EA community because you accidentally publish something wrong, that's something I worry about too.  I'm not sure how much worry about that is appropriate, though. As far as social consequences, a lot of people will be impressed just to see you publishing something, and fewer will be aware you wrote something wrong unless you really screw things up. A retraction or correction can go along way to remedy things, too.

Kidney stone pain as a potential cause area


I'm just seeing your comment now for some reason. This is super helpful. 

Regarding your first point (pain vs suffering), that's pretty interesting and makes sense. I would just note that the degree to which people can detach from painful experiences varies. Regarding suffering from operations and stents, I have heard the same thing about stents, and that is something we would have to factor in, I think, to a Fermi estimate of the amount of suffering that could be alleviated with early interventions for kidney stones.  I wonder if someone could invent a stent that actually diffuses a bit of anesthetic around it while it is in (my understanding is the stents are typically only temporary in place). 

Regarding the second point "Regarding small stones being downplayed:". So, after writing this I looked into it a bit further because I was interested in whether an AI application assisting in early detection might be high value. The idea that radiologists miss tiny stones is only my personal guess. I have only seen 1-2 examples of this, when I was running a system I developed for stone detection and it found stones in CT colonography scans that were not mentioned in the report.. but those 1-2 examples only surfaced after running on over 6,000 scans. 

Regarding how what happens with tiny stones: data on this subject is very scarce, but it seems most tiny stones resolve on their own without major symptoms (?). It's really not very clear. I found one paper which covers this question, although it doesn't directly study it. Looking at CT scans for CT colonography they found that 7.8% of patients (all middle aged adults) had asymptomatic stones. They then found that only 10% of patients with asymptomatic stones were later recorded as having symptoms over a variable follow-up interval that extended to a maximum of 10 years. So it seems the tiny stones don't cause symptoms... but maybe it takes longer than 10 years before they start to manifest symptoms..  Probably having a tiny stone puts you at massively higher risk for a symptomatic stone event later in life. There's very little data on this question or about stone growth dynamics across lifespan in general.. it's not something that's very easy to study. Basically, scanning people with CT just to monitor their stone size is absurd, so to study this we have to mine historical scans and then try to find follow-up scans to see if the same stones are still there, and compare their volume (which is a bit tricky to do accurately when the scan parameters change).  This is an application for deep learning based automated stone segmentation algorithms, actually, to assist in doing such a study. We have a conference paper under review that actually does this although I have to say it's technically and logistically challenging to do.   

Regarding the third point, "Radiation stigma": I agree, I think the way you are thinking about this is pretty in line with the risk-benefit calculus as far as I understand it. I should have elaborated a bit more, in my post. I was not thinking of doing a screening CT only to screen for kidney stones. I've been working with Prof. Pickhardt at UW. One of the things we've been researching is the utility of a low-dose "screening CT" in middle age.  The screening CT would cover many things including stones. Prof. Pickhardt is working on assembling data to support this idea, mainly focusing on the value of scanning just the abdomen (not the chest). People currently get a coronary calcium score ("CAC")  CT scan for screening their  cardiovascular risk.  The abdomen also contains biomarkers (like aortic plaque) that also can gauge cardiovascular risk, plus we can look for a lot of other stuff in the abdomen, including kidney stones.  

I agree the preventative approach is probably the most promising (identifying patients at-risk using genetics, blood tests, and maybe other factors, not screening CT) .. especially given how safe and cheap potassium citrate is.  

Where would we set up the next EA hubs?

I live near Boston (Somerville, just north of Cambridge).

Perhaps this is obvious but it's worth noting that the urban core of Boston is denser than a lot of other cities, which makes it easier to get around, either by walking, bike, or public transit. The public transportation is very good (although only by American standards) and will get better with the Green line extension opening (hopefully by Fall 2022). They also seem to be doing a really good job with urban planning / construction here generally compared to other cities (they are actually allowing lots of new housing to be built to meet rising demand). 

I'm excited about the EA co-working (https://forum.effectivealtruism.org/posts/cCrMqacEhFRnoHthF/do-you-want-to-work-in-the-new-boston-ea-office-at-harvard) and biosecurity hub projects here. 

The main downside is the cold weather, which is exacerbated by moist air from the ocean. However if you know how to dress properly it shouldn't be too much of an issue.  Another downside, if you're young, is high-turnover among the people in their 20s since many are just here for school. 

DeepMind’s generalist AI, Gato: A non-technical explainer

Thanks, yeah I agree overall. Large pre-trained models will be the future, because of the few shot learning if nothing else. 

I think the point I was trying to make, though, is that this paper raises a question, at least to me, as to how well these models can share knowledge between tasks. But I want to stress again I haven't read it in detail. 

In theory, we expect that multi-task models should do better than single task because they can share knowledge between tasks. Of course, the model has to be big enough to handle both tasks. (In medical imaging, a lot of studies don't show multi-task models to be better, but I suspect this is because they don't make the multi-task models big enough.)  It seemed what they were saying was it was only in the robotics tasks where they saw a lot of clear benefits to making it multi-task, but now that I read it again it seems they found benefits for some of the other tasks too. They do  mention later that transfer across Atari games is challenging. 

Another thing I want to point out is that at least right now training large models and parallelization the training over many GPUs/TPUs is really technically challenging.  They even ran into hardware problems here which limited the context window they were able to use. I expect this to change though with better GPU/TPU hardware and software infrastructure. 

We Ran an AI Timelines Retreat

"The retreat lasted from Friday evening to Sunday afternoon and had 12 participants from UCLA, Harvard, UCI, and UC Berkeley.  There was a 1:3 ratio of grad students to undergrads" 

So it was 9 undergrads and 3 grads interested in AI safety? This sounds like a biased sample. Not one postdoc, industry researcher, or PI? 

To properly evaluate timelines, I think you should have some older more experienced folks, and not just select AI safety enthusiasts, which biases your sample of people towards those with shorter timelines.  

How many participants have actually developed AI systems for a real world application? How many have developed an AI system for a non-trivial application? In my experience many people working in AI safety have very little experience with real-world AI development, and many I have seen have none whatsoever. That isn't good when it comes to gauging timelines, I think.  When you get into the weeds and learn "how the sausage is made" to create AI systems, (ie true object level understanding), I think it makes you more pessimistic on timelines for valid reasons. For one thing, you are exposed to weird unexplainable failure modes which are not published or publicized. 

DeepMind’s generalist AI, Gato: A non-technical explainer

Note : I haven't studied any of this in detail!!! 

This review is nice but it is a bit to vague to be useful, to be honest. What new capabilities, that would actually have economic value, are enabled here? It seems this is very relevant to robotics and transfer between robotic tasks. So maybe that?  

Looking at figure 9 in the paper the "accelerated learning" from training on multiple tasks seems small. 

 Note the generalist agent I believe has to be trained on all things combined at once, it can't be trained on things in serial (this would lead to catastrophic forgetting). Note this is very different than how humans learn and is a limitation of ML/DL.  When you want the agent to learn a new task, I believe you have to retrain the whole thing from scratch on all tasks, which could be quite expensive. 

It seems the 'generalist agent' is not better than the specialized agents in terms of performance, generally.   Interestingly, the generalist agent can't use text based tasks to help with image based tasks. Glancing at figure 17, it seems training on all tasks hurt the performance on the robotics task (if I'm understanding it right). T his is different than a human - a human who has read a manual on how to operate a forklift, for instance, would learn faster than a human who hasn't read the manual. Are transformers like that? I don't think we know but my guess is probably not, and the results of this paper support that. 

So I can see an argument here that this points towards a future that is more like comprehensive AI services rather than a future where research is focused on building monolithic "AGIs"..  which would lower x-risk concerns, I think. To be clear I think the monolithic AGI future is much more likely, personally, but this paper makes me update slightly away from that, if anything.  

Load More