[English] AI Safety Guide for TRUE Beginners by TRUE beginners

Karime Pacheco; Rous Polanco; Saralet_cr

Have you ever wondered what is the structure behind what we know today as “Artificial Intelligence”? How do language models like Chat GPT, Gemini, Claude, or Deep Seek work? What happens every time it receives a prompt? Under what security protocols do they operate? What applications exist around us? How to obtain the greatest benefit? What is AI Safety and why is it so important?^[1]

An AI is not just an application that responds to user questions, nor is it solely a collection of information gathered historically. Rather, it is an entire structure that operates based on security protocols and structures meticulously created with the objective of converting words into a mathematical language (vector space) that the model can compute.

Us working on our program RAISE (Research Alignment Internship for Safety Education)

As Data Engineering students in Mérida, Yucatán, México, due to our university internships, we decided to work with the ARENA 3.0 material.

However, the birth of this post results from some uncomfortable questions as a group: How can we delve into this world while still being students or beginners? Where can we start? What previous knowledge sets the foundation to improve understanding?^[2]

All these questions were only the beginning of everything that was coming for us, and like many, we felt overwhelmed starting this course and seeing so many concepts that we didn't know or ignored. To resolve these questions, we focused specifically on the chapter “1.1 Transformers from scratch” whose intention is to understand how gpt2 works internally from scratch using pytorch and to link technical knowledge with topics on security and alignment of AI models.

AI transcends everything we thought possible, with surprising adoption and great potential for utilization. For this reason, our focus is directed at all those interested in entering this world but who still do not know where to start.

But the million-dollar question would be: How can we give an opinion on AI without understanding the basis of its structural functioning (transformers)?

AI Safety is a field focused on seeking the security of Artificial Intelligence systems, and at the same time, that these are aligned with human values, acting in a beneficial way and avoiding causing accidental or unforeseen damage. At the same time, AI Safety is composed of 4 key aspects, which are indispensable when one delves into the structure as such; the pillars are:

Alignment: It refers to the AI's objectives matching human values and purposes. Robustness: It refers to the AI functioning correctly even when new or strange situations are presented.
Interpretability: It is that process in which the human analyzes the cause of a decision made by the AI.
Monitoring & Control: It refers to the security mechanisms that supervise the behavior of the AI in real time to intervene if necessary.

When starting on the topic of AI, one must keep in mind that there may be certain confusions between concepts, and that is why it is necessary to make clear the difference between AI Safety and AI Ethics; the first focuses more on avoiding any type of incident that could be catastrophic, while the second’s focus is to establish principles, values, and moral norms for the design, development, and deployment of AI. Now then, starting to answer all the questions, what happens when we give a prompt to the AI? To answer this first question, we must take into account that an AI currently does not think for itself, but rather processes each word you give it. This is very important, as here the breakdown of how its structure and architecture work begins.

Tokenization: It is the process of decomposing text, which assigns a mathematical value (token) to each sub-unit (not necessarily in full words or individual letters), and these are accompanied by another value that works as an identifier; this helps to obtain predictions and answers with greater accuracy. For we must remember that the AI does not process words, but numbers exclusively; this is why words are converted into tokens.
Example: “tokenizar” —> “token-izar”
Transformer: All AI models have this architecture in common. The Transformer architecture is responsible for processing a provided input in parallel and predicting what the next token could be; that is, the Transformer processes all the tokens of the text at the same time, so that nothing is lost and consequently a clear, precise, and certain answer is obtained.
Self-Attention: A very important part within the Transformer is the Self-Attention mechanism, which is responsible for weighing the importance of various sub-units of the input sequence when processing an element of the same. This helps us improve the context of the input and allows the model to process each token by relating each sub-unit with the previous or subsequent set, thus understanding the global context instead of just the local one. (not to be confused with causal attention; the difference between these two lies in the restriction of memory access: while self-attention can “see” any token, causal attention can only access the previous ones. All causal-attention is a form of self-attention but not all self-attention is causal-attention)
Network: Neural networks are machine learning models based on the human brain, and they are used to recognize complex patterns. At the same time, it has very important components when it comes to AI; some of its components and functions are:
1. Layers: Different types of Layers exist within the process, each with its own functioning that is indispensable for the process. The main layers within the AI architecture are: Input Layer, Hidden Layer, Output Layer. But in general, these are responsible for receiving, analyzing the information, and delivering the final prediction, respectively.
2. Neurons: Neurons function by processing the information received from the previous layer, processing it, and weighting it to highlight patterns.
3. Weights: Weights are those parameters responsible for determining the importance of the connection between two neurons; in other words, they analyze how important the given information is.
Output: For the result output, it is important to keep in mind that the AI does not know the answer like a human being, because it does not “think” on its own, but rather calculates the answer based on probabilities and mathematical predictions of the information it has been given, and from that, it reaches a conclusion.
Safety Layer: This is the last stage of the entire process. Before the output gives a final answer, it passes through this last layer which is responsible for analyzing said answer and verifying if it complies with the assigned security protocols or not, acting as a control mechanism within the mathematical model.

It is possible that at this point you are already overwhelmed by the amount of information described; exactly that happened to us as students at first, especially when at first glance it might not seem like much considering it is only a summary of what makes up the architecture and functioning of AI and AI Safety. Nonetheless, it is of vital importance to consider all these small concepts and explanations, as we consider them to be the foundations to be able to understand the field and have a pleasant and enriching study and avoid the haze or stress from the amount of information that begins to unfold.

After having made clear everything that the process an AI has and its components entails, we can answer the question “What applications exist around us?”.

This question was very interesting because, although they were probably examples in plain sight, for us they were questions to which we had never paid proper attention, of which we began to have a clearer vision while we were studying.

Some AI applications, beyond Chat GPT, Gemini, etc., are the algorithms on different platforms like Tik Tok. The model of this platform calculates which video we might like based on previous videos watched, liked, or with more views by the user. Another example is the Google search engine to autocomplete a possible search based on other previous searches by the user or the predictive keyboard of a phone. And a final example based on our city: public transport contains an AI system which is programmed and trained to announce the next stops around the city depending on the route you take, and at the same time it is linked to a GPS system for greater precision.

RAISE is one of the first programs of AI Safety UPY (Universidad Politécnica de Yucatán)

At this point, we began to discern small elements of daily life that go unnoticed but can be linked to an AI. However, assuming we already have the basic knowledge with the concepts explained above, now we can answer the question “How to obtain the greatest benefit?”. One way to obtain good results when using AI is by improving the prompts we give it, since, taking advantage of the fact that we already understand how Self-Attention works, we will be able to give it more details to improve the context, thus achieving more precise, correct answers with a low probability of hallucinations. But we must not forget that AI calculates probabilities, so it can make mistakes; therefore, it is important at all times to read and verify the given answers regardless of how good your prompt is.

As Data Engineering students, our biggest question was how and where to start. Therefore, we can tell all those interested in venturing into this new field that the best advice is not to force yourself to know every mathematical formula from the beginning, since that will only generate an information entry bottleneck. We understood that the best thing is to first understand the theory and logic of the data, the processes, and AI Safety, as well as key concepts that will significantly lighten the load without taking away importance from its purpose.

At the end of all this journey, we can say that AI Safety is of utmost importance because it is the bridge and the security that exists between technology that keeps advancing and human technology. Without the key elements of AI Safety: alignment, robustness, and interpretability, which at the same time are part of the structure of AI, perhaps we would have an Artificial Intelligence very capable of “solving” any problem but without a limit with which to measure right and wrong, becoming a potential danger. As beginners for beginners, our best advice is “Do not be afraid of AI, but rather seek the understanding of its structure.” The future of data is not only creating them, it is taking care of them and understanding them.

^{^}
This post was written by members of the group AI Safety UPY
^{^}
Thanks to Rous Polanco and Saralet Chan for the redaction!

Effective Altruism Forum
EA Forum

[English] AI Safety Guide for TRUE Beginners by TRUE beginners

1

1

Reactions

More posts like this