Hide table of contents

1. What experience do you have that you think could be relevant to technical AI safety research? (e.g. programming, coursework, internships, self-study).  

Since June 2021, together with researchers from MIT Energy Initiative, Carnegie Mellon University, and Aalborg University, I have conducted research on the application of AI in energy device design and condition evaluation. I have experienced a series of processes, such as theoretical learning, physical experiment, data collection, bad data elimination, model construction, programming and debugging, data visualization, human-computer interface design, error analysis, paper writing and modification, and conference communication. In this process, I have been asked by many peers: "How to prove that your results are credible" and other similar questions. This makes me realize their universality and importance.

For more details,

a. My partners and I used neural network to study the collected data of film capacitors, and gave the results of predicting the capacitance value based on the material and designing the material type based on the capacitance value.
(https://www.researchgate.net/publication/365978460_Artificial_Intelligence_Aided_Design_for_Film_Capacitors)
b. I worked on target marking in a machine vision project for supermarket product identification.
(https://www.researchgate.net/publication/359709838_Unitail_Detecting_Reading_and_Matching_in_Retail_Scene)
c. I was involved in the programming and debugging of the "prediction of polymer dielectric properties" project.
(https://www.researchgate.net/publication/361826459_Prediction_of_high-temperature_polymer_dielectrics_using_a_Bayesian_molecular_design_model)
d. I participated in the feasibility discussion of the project "prediction of electrical performance of composite dielectrics using CNN".
(https://www.researchgate.net/publication/366932419_Prediction_on_the_relative_permittivity_of_energy_storage_composite_dielectrics_using_convolutional_neural_networks_A_fast_and_accurate_alternative_to_finite-element_method)

2. In your own words, please describe the alignment problem. Why might a future AI system have objectives that differ from our own and why should we be concerned about this? Feel free to reference the resources section on this page: https://www.serisummerfellowship.org/ai-safety

In the process of applying the AI system to the real world to help humans solve complex problems, we should find a good way to align the AI system's goals with human goals. 

The current mathematical expressions of the transformation relationship between the input and output of an AI system are difficult to become the typical human interpretations of decision-making or prediction. If people set goals that are not well representative of expected behavior, the AI system will learn the wrong behaviors. If people's goals are inherently complex, AI systems are prone to misbehave which is difficult for us to understand.

At the same time, I see many researchers trying to use AI systems in key fields including energy, economy, environmental protection, etc. In these critical areas, it would be dangerous if the wrong system did not work as expected because we neglected the alignment problem in our design.

3. Choose one problem from the document linked below (or identify your own). Come up with a concrete sub-question you could answer that would be helpful for solving this problem, describe why you think this question is important, and briefly generate some steps you might take to go about answering it. 

https://docs.google.com/document/d/1KyVUzMcv2ZUBLZka3Ftyungq_Bzlab4sCYbHFdDufQ0/edit?usp=sharing 

The problem I chose is "How can we design systems that learn hard-to-specify objectives?".

My sub-question is " How to use the human role to make AI learn difficult-to-specify goals?".

During the image recognition projects I was involved in, I realized that human influence could help models be trained better. At the same time, I have seen other examples of improving AI performance by introducing human influence. And, another reason why I think this problem is essential is that I find that similar needs and phenomena are common. Finally, I think there is a huge potential to be tapped to help AI learn hard-to-specify objectives through human influence.

The steps I will take are as follows:

a.      Study the mechanism of human feedback (which can be the form of natural language) on the model's weight and the final result in the iterative process.

b.      Analyze the influence laws of the number of human influences introduced, the professional level of humans themselves, and the degree of ambiguity of human evaluations on AI achieving the expected goals.

c.      Build a "Human-AI interaction interface" that displays the output results of the current training state. With this interface, Humans can influence the AI training process by evaluating the output results, such as telling the AI whether the current learning meets human goals.

d.      Explore ways to reduce the amount of human feedback while ensuring the effect, and measures to improve the quality of human influence on AI.

4. If you have a specific AI safety project idea that you are interested in working on this summer, please describe it. Note that you do NOT need one, and that if you do have one, you may end up working on something else because we prioritize research questions that our mentors think are better learning opportunities.

How can humans help AI improve performance when it comes to designing or evaluating the state of energy devices?

5. For the separate track: what specific AI risks or failure-modes are you most concerned about and why?

I hope to study Machine Learning’s robustness, assurance, and specification, and use the obtained theory to guide my research topic - the application of Artificial Intelligence (AI) in energy device design and state evaluation.
Energy devices such as film capacitors play the role of energy nodes in the energy system. At present, researchers have made a number of beneficial attempts in using AI methods to optimize the design of energy devices, evaluate the state of energy devices during operation, and predict the future performance of energy devices. However, there are still many gaps in the research on AI safety on these topics.
The quality and state of energy devices are related to the quality and state of energy supply. If it is difficult to clarify the robustness, assurance, and specification when applying AI technology, the problem will be disastrous for the energy system and even the economy and society.

6. Describe a nontrivial strength or weakness of the reading that isn’t explicitly mentioned in the reading itself. (Existential Risk from Power-Seeking AI (Joe Carlsmith) - YouTube)

Although I think we humans need researchers to deal with AI safety problems from different perspectives because what problems AI faces are also what we humans face. However, if the authors' conclusions that " warning shots are helpful, but not sufficient " are based on analogies or thought experiments, then this may not be very convincing to people with more engineering experience (especially AI experts). In the statement that “By 2070, it will become possible and financially feasible to build APS systems. 65%”, Some people may not know where the rough number 65% of here comes from. And they will also doubt its reality. Or at least, it needs more evidence, discussions, and explanations to reduce skepticism from people who are taken it with a grain of salt. I suggest the author making more cooperate with people whose professional field is AI. The latter could provide more calculation results based on mathematical models and data from physical tests or experiments. These results could be important references for the judgment numbers, and combined conclusions will be more comprehensive than only getting the conclusions from intuitions, minds, and personal estimates.

7. Describe a nontrivial strength or weakness of the reading that isn’t explicitly mentioned in the reading itself. (https://drive.google.com/file/d/1p4ZAuEYHL_21tqstJOGsMiG4xaRBtVcj/view?usp=sharing)

Although the author gave a comprehensive discussion that natural selection creates incentives for AI agents to act against human interests, however, the theory of evolution has its controversy [1]. What’s more, there are also voices questioning the derivative theory of evolution [2]. There could be a new question: what is the specific proportion of AI evolution influenced by the theory of evolution? Even though there is a try to support the power of evolution in deep learning [3], the author still needed to give more strong evidence. At least, more results, reasonings, and arguments from mathematical models and simulations. During the modeling process, other influence factors like humans and nature could be added to enhance the persuasiveness of this work. 

From section 3 to appendix A, although the author gave us a comprehensive introduction about “natural selection favors selfish AIs” and “counteracting Darwinian forces”; and supplemented information with many examples. It is still needed to elaborate the statements like “Since AIs would be able to do things more efficiently without humans…” “This is because altruism to similar individuals can be seen as a form of selfishness, from the information-centered view.” “However AIs have no such reasons to cooperate with us by default, so they have reason to compete and conflict with us.” with more evidence or references. For example, the experiment results of reference [4] can be added as supplements for statements in this paper to strengthen persuasiveness. “If AIs are much more capable than humans (This has been experimentally verified in certain aspects. [4]), they would learn more from interacting with one another than from interacting with humans.” Or adding viewpoints [5] that contradict the “meme” in this paper to enhance its rigor. 

[1] Fowler T B, Kuebler D. The evolution controversy: A survey of competing theories[M]. Baker Academic, 2007. 

[2] Confer J C, Easton J A, Fleischman D S, et al. Evolutionary psychology: Controversies, questions, prospects, and limitations[J]. American Psychologist, 2010, 65(2): 110. 

[3] Kaznatcheev A, Kording K P. Nothing makes sense in deep learning, except in the light of evolution[J]. arXiv preprint arXiv:2205.10320, 2022.

[4] Fogel D B. Artificial intelligence through simulated evolution[M]. Wiley-IEEE Press, 1998.

 [5] Husbands P, Harvey I, Cliff D, et al. Artificial evolution: A new path for artificial intelligence?[J]. Brain and cognition, 1997, 34(1): 130-159.

8. How can humans harness super AI that exceeds their intelligence level?

There are two classic figures in Chinese history whose names are Liu Bei and Zhuge Liang. Zhuge Liang was smarter than Liu Bei, but Zhuge Liang was loyal to Liu Bei. Zhuge Liang was loyal to Liu Bei because Zhuge Liang was attracted by Liu Bei's ideals and personal charisma. Perhaps we can learn something from their relationship, which Liu Bei calls the relationship between fish and water. Their relationship model may help in the management of super AI.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 31m read
 · 
James Özden and Sam Glover at Social Change Lab wrote a literature review on protest outcomes[1] as part of a broader investigation[2] on protest effectiveness. The report covers multiple lines of evidence and addresses many relevant questions, but does not say much about the methodological quality of the research. So that's what I'm going to do today. I reviewed the evidence on protest outcomes, focusing only on the highest-quality research, to answer two questions: 1. Do protests work? 2. Are Social Change Lab's conclusions consistent with the highest-quality evidence? Here's what I found: Do protests work? Highly likely (credence: 90%) in certain contexts, although it's unclear how well the results generalize. [More] Are Social Change Lab's conclusions consistent with the highest-quality evidence? Yes—the report's core claims are well-supported, although it overstates the strength of some of the evidence. [More] Cross-posted from my website. Introduction This article serves two purposes: First, it analyzes the evidence on protest outcomes. Second, it critically reviews the Social Change Lab literature review. Social Change Lab is not the only group that has reviewed protest effectiveness. I was able to find four literature reviews: 1. Animal Charity Evaluators (2018), Protest Intervention Report. 2. Orazani et al. (2021), Social movement strategy (nonviolent vs. violent) and the garnering of third-party support: A meta-analysis. 3. Social Change Lab – Ozden & Glover (2022), Literature Review: Protest Outcomes. 4. Shuman et al. (2024), When Are Social Protests Effective? The Animal Charity Evaluators review did not include many studies, and did not cite any natural experiments (only one had been published as of 2018). Orazani et al. (2021)[3] is a nice meta-analysis—it finds that when you show people news articles about nonviolent protests, they are more likely to express support for the protesters' cause. But what people say in a lab setting mig