Some Preliminary Opinions on AI Safety Problems

yonxinzhang

1. What experience do you have that you think could be relevant to technical AI safety research? (e.g. programming, coursework, internships, self-study).

Since June 2021, together with researchers from MIT Energy Initiative, Carnegie Mellon University, and Aalborg University, I have conducted research on the application of AI in energy device design and condition evaluation. I have experienced a series of processes, such as theoretical learning, physical experiment, data collection, bad data elimination, model construction, programming and debugging, data visualization, human-computer interface design, error analysis, paper writing and modification, and conference communication. In this process, I have been asked by many peers: "How to prove that your results are credible" and other similar questions. This makes me realize their universality and importance.

For more details,

a. My partners and I used neural network to study the collected data of film capacitors, and gave the results of predicting the capacitance value based on the material and designing the material type based on the capacitance value.
(https://www.researchgate.net/publication/365978460_Artificial_Intelligence_Aided_Design_for_Film_Capacitors)
b. I worked on target marking in a machine vision project for supermarket product identification.
(https://www.researchgate.net/publication/359709838_Unitail_Detecting_Reading_and_Matching_in_Retail_Scene)
c. I was involved in the programming and debugging of the "prediction of polymer dielectric properties" project.
(https://www.researchgate.net/publication/361826459_Prediction_of_high-temperature_polymer_dielectrics_using_a_Bayesian_molecular_design_model)
d. I participated in the feasibility discussion of the project "prediction of electrical performance of composite dielectrics using CNN".
(https://www.researchgate.net/publication/366932419_Prediction_on_the_relative_permittivity_of_energy_storage_composite_dielectrics_using_convolutional_neural_networks_A_fast_and_accurate_alternative_to_finite-element_method)

2. In your own words, please describe the alignment problem. Why might a future AI system have objectives that differ from our own and why should we be concerned about this? Feel free to reference the resources section on this page: https://www.serisummerfellowship.org/ai-safety

In the process of applying the AI system to the real world to help humans solve complex problems, we should find a good way to align the AI system's goals with human goals.

The current mathematical expressions of the transformation relationship between the input and output of an AI system are difficult to become the typical human interpretations of decision-making or prediction. If people set goals that are not well representative of expected behavior, the AI system will learn the wrong behaviors. If people's goals are inherently complex, AI systems are prone to misbehave which is difficult for us to understand.

At the same time, I see many researchers trying to use AI systems in key fields including energy, economy, environmental protection, etc. In these critical areas, it would be dangerous if the wrong system did not work as expected because we neglected the alignment problem in our design.

3. Choose one problem from the document linked below (or identify your own). Come up with a concrete sub-question you could answer that would be helpful for solving this problem, describe why you think this question is important, and briefly generate some steps you might take to go about answering it.

https://docs.google.com/document/d/1KyVUzMcv2ZUBLZka3Ftyungq_Bzlab4sCYbHFdDufQ0/edit?usp=sharing

The problem I chose is "How can we design systems that learn hard-to-specify objectives?".

My sub-question is " How to use the human role to make AI learn difficult-to-specify goals?".

During the image recognition projects I was involved in, I realized that human influence could help models be trained better. At the same time, I have seen other examples of improving AI performance by introducing human influence. And, another reason why I think this problem is essential is that I find that similar needs and phenomena are common. Finally, I think there is a huge potential to be tapped to help AI learn hard-to-specify objectives through human influence.

The steps I will take are as follows:

a. Study the mechanism of human feedback (which can be the form of natural language) on the model's weight and the final result in the iterative process.

b. Analyze the influence laws of the number of human influences introduced, the professional level of humans themselves, and the degree of ambiguity of human evaluations on AI achieving the expected goals.

c. Build a "Human-AI interaction interface" that displays the output results of the current training state. With this interface, Humans can influence the AI training process by evaluating the output results, such as telling the AI whether the current learning meets human goals.

d. Explore ways to reduce the amount of human feedback while ensuring the effect, and measures to improve the quality of human influence on AI.

4. If you have a specific AI safety project idea that you are interested in working on this summer, please describe it. Note that you do NOT need one, and that if you do have one, you may end up working on something else because we prioritize research questions that our mentors think are better learning opportunities.

How can humans help AI improve performance when it comes to designing or evaluating the state of energy devices?

5. For the separate track: what specific AI risks or failure-modes are you most concerned about and why?

I hope to study Machine Learning’s robustness, assurance, and specification, and use the obtained theory to guide my research topic - the application of Artificial Intelligence (AI) in energy device design and state evaluation.
Energy devices such as film capacitors play the role of energy nodes in the energy system. At present, researchers have made a number of beneficial attempts in using AI methods to optimize the design of energy devices, evaluate the state of energy devices during operation, and predict the future performance of energy devices. However, there are still many gaps in the research on AI safety on these topics.
The quality and state of energy devices are related to the quality and state of energy supply. If it is difficult to clarify the robustness, assurance, and specification when applying AI technology, the problem will be disastrous for the energy system and even the economy and society.

6. Describe a nontrivial strength or weakness of the reading that isn’t explicitly mentioned in the reading itself. (Existential Risk from Power-Seeking AI (Joe Carlsmith) - YouTube)

Although I think we humans need researchers to deal with AI safety problems from different perspectives because what problems AI faces are also what we humans face. However, if the authors' conclusions that " warning shots are helpful, but not sufficient " are based on analogies or thought experiments, then this may not be very convincing to people with more engineering experience (especially AI experts). In the statement that “By 2070, it will become possible and financially feasible to build APS systems. 65%”, Some people may not know where the rough number 65% of here comes from. And they will also doubt its reality. Or at least, it needs more evidence, discussions, and explanations to reduce skepticism from people who are taken it with a grain of salt. I suggest the author making more cooperate with people whose professional field is AI. The latter could provide more calculation results based on mathematical models and data from physical tests or experiments. These results could be important references for the judgment numbers, and combined conclusions will be more comprehensive than only getting the conclusions from intuitions, minds, and personal estimates.

7. Describe a nontrivial strength or weakness of the reading that isn’t explicitly mentioned in the reading itself. (https://drive.google.com/file/d/1p4ZAuEYHL_21tqstJOGsMiG4xaRBtVcj/view?usp=sharing)

Although the author gave a comprehensive discussion that natural selection creates incentives for AI agents to act against human interests, however, the theory of evolution has its controversy [1]. What’s more, there are also voices questioning the derivative theory of evolution [2]. There could be a new question: what is the specific proportion of AI evolution influenced by the theory of evolution? Even though there is a try to support the power of evolution in deep learning [3], the author still needed to give more strong evidence. At least, more results, reasonings, and arguments from mathematical models and simulations. During the modeling process, other influence factors like humans and nature could be added to enhance the persuasiveness of this work.

From section 3 to appendix A, although the author gave us a comprehensive introduction about “natural selection favors selfish AIs” and “counteracting Darwinian forces”; and supplemented information with many examples. It is still needed to elaborate the statements like “Since AIs would be able to do things more efficiently without humans…” “This is because altruism to similar individuals can be seen as a form of selfishness, from the information-centered view.” “However AIs have no such reasons to cooperate with us by default, so they have reason to compete and conflict with us.” with more evidence or references. For example, the experiment results of reference [4] can be added as supplements for statements in this paper to strengthen persuasiveness. “If AIs are much more capable than humans (This has been experimentally verified in certain aspects. [4]), they would learn more from interacting with one another than from interacting with humans.” Or adding viewpoints [5] that contradict the “meme” in this paper to enhance its rigor.

[1] Fowler T B, Kuebler D. The evolution controversy: A survey of competing theories[M]. Baker Academic, 2007.

[2] Confer J C, Easton J A, Fleischman D S, et al. Evolutionary psychology: Controversies, questions, prospects, and limitations[J]. American Psychologist, 2010, 65(2): 110.

[3] Kaznatcheev A, Kording K P. Nothing makes sense in deep learning, except in the light of evolution[J]. arXiv preprint arXiv:2205.10320, 2022.

[4] Fogel D B. Artificial intelligence through simulated evolution[M]. Wiley-IEEE Press, 1998.

[5] Husbands P, Harvey I, Cliff D, et al. Artificial evolution: A new path for artificial intelligence?[J]. Brain and cognition, 1997, 34(1): 130-159.

8. How can humans harness super AI that exceeds their intelligence level?

There are two classic figures in Chinese history whose names are Liu Bei and Zhuge Liang. Zhuge Liang was smarter than Liu Bei, but Zhuge Liang was loyal to Liu Bei. Zhuge Liang was loyal to Liu Bei because Zhuge Liang was attracted by Liu Bei's ideals and personal charisma. Perhaps we can learn something from their relationship, which Liu Bei calls the relationship between fish and water. Their relationship model may help in the management of super AI.

Effective Altruism Forum
EA Forum

Some Preliminary Opinions on AI Safety Problems

5

5

Reactions

More posts like this