/plushcap/analysis/assemblyai/youtube/rtx03_iC46U

The Physics of Generative AI - How AI models use physics to generate novel data

Company
AssemblyAI

Date published
Jan. 25, 2024

Transcript

Distinct fields often cross pollinate important concepts, which help drive their progress. Concepts from mathematics lay at the foundation of progress in physics, and concepts from physics often inspire frameworks in economics. In this video, we're going to take a look at how AI has joined these ranks, pulling in concepts from physics to create state of the art AI models. First, we'll give a general overview of how physics inspired AI models work. Then we'll take a look at two specific cases. The first uses the field of electrostatics, where treat data points as electrons helps us generate novel images. The second uses the field of thermodynamics, where treating data points as if they were atoms in a gas similarly helps us create images. So let's start off with some general concepts. Generative AI works by sampling from a data distribution. If we want to generate a sample of humans considering only height, we can't just pick the heights at random, because this doesn't reflect reality. Instead, we have to account for the fact that some heights are more likely than others. We can approximate the true distribution by fitting a Gaussian curve to some training data, and then sample from this learned distribution to generate a realistic sample. We'll often impose assumptions on this underlying distribution to make the problem tenable. In the last example, we assume that human heights follow a gaussian distribution. Now, there may be good reasons to assume that data is Gaussian, like the central limit theorem. But some data distributions are much more complicated than simple Gaussians. Consider a distribution of images, say, human faces. For human heights, we have only one dimension, the height. But for a 512 by 512 pixel image, we have over 260,000 dimensions. The distribution is extremely complicated. So learning the distribution directly, like we might for human heights, is simply not feasible in this case. How does generative AI get around this problem? Remember, our end goal is just to generate data by sampling from the data distribution. If we can figure out some way to do this without learning the distribution, then it should still work. Instead of learning the data distribution and sampling from it directly, we'll instead sample from a simpler distribution and then map that into our complicated data distribution. At a first glance, this doesn't seem much easier. We've just replaced the problem of finding this complicated data distribution with finding a mapping to that distribution. But this is where physics comes into play. The fundamental observation is that physics itself maps data to simple distributions. For example, consider a rod of metal that's heated on one end. Over time, the temperature in the rod becomes constant everywhere. So we've turned this more complicated distribution of heat into a more uniform one. In fact, we could do this really with any initial distribution of heat, and the end result would be the same. Over time, the temperature becomes uniform everywhere. If we can figure out how to reverse this physical process, then we have our reverse mapping from a simple distribution into a complicated one. This is how these physics inspired models work. We observe a case in nature where something is mapped to a simple distribution. We cast our problem in this language and then teach a model how to reverse this physical process, which is equivalent to learning the reverse mapping from the simple distribution into the complicated one. To generate data, we sample from the simple distribution and then pass it through the reverse mapping to obtain a sample from the complicated distribution, say images. Now that we have a general understanding of this process, let's see how it works in practice. We'll start off with Poisson flow generative models, or PFGMs. PFGMs treat data points as electrons and exploit the electric field that these data points generate. Consider some sort of two dimensional data distribution, like the height and weight of humans. Now imagine this data distribution is actually a charge distribution, where points that are more probable are considered to have more electric charge. What would the electric field of this distribution look like? Well, generally, it's going to be very complicated and may have high curvature around the charge distribution itself. But as we zoom out, the electric field gets more and more regular. At very far distances, the charge distribution would look like a point charge, because the distance between different points in the distribution is negligible compared to the distance to the distribution itself. The electric field of a point charge is very simple, just pointing out radially in every direction. Since the electric field is continuous, the complicated field around the charge distribution must connect smoothly eventually to this radial distribution at far distances. Remember, our goal is to map from a complicated distribution to a simple one. In this case, the electric field itself provides this mapping. If we follow the electric field lines, our complicated 2D charge distribution will transform into this circular one at far distances to generate data. Then we can just generate simple spherical data and then travel backwards along these electric field lines to yield new data points from our data distribution. So what this comes down to in practice is learning the electric field generated by a data distribution, because the electric field is our mapping. In reality, we learn an approximate field by using training data sampled from the data distribution. If we have a data set of images, for example, we learn the field it generates at every point in the high dimensional space by simply adding the electric fields generated by each image treated as a charged particle. The total field is equal to the sum of these individual field contributions. Due to the superposition principle, we implement this electric field approximator as a unit, which takes in an input vector for a point in space and returns the electric field vector at that point. If you want to see additional details on how this works, you can check out our article on PFGMs. It contains a deep dive section that includes all necessary background information. PFGMs came out at the end of last year, and a successor, PFGM plus Plus, was published more recently. The authors argue that PFGMs offer benefits over diffusion models, which power stable diffusion in dolly. Speaking of diffusion models, let's talk about our next physics inspired paradigm. While electrostatics inspired PFGMs, thermodynamics inspired diffusion models, many of you have already probably learned about diffusion models. So instead of going into details about how they work here, we'll focus on high level intuition. If you want to learn more about how diffusion models actually work, you can check out our blog or video on the topic. In the case of PFGMs, we exploited the fact that the motion of charged particles along electric field lines maps to a simple circular distribution. In the case of diffusion models, we exploit the fact that the random motion of atoms maps to a gaussian distribution. To see how this works conceptually, let's take a step back. Thermodynamics can be viewed as the study of the macroscopic results of microscopic randomness. For example, if we throw a bunch of coins on the ground, we can ask how the probability of 50% of them landing heads up compares to the probability of 100% of them landing heads up. Let's look at the case of four coins. The probability that 100% or all four of them land heads up is less than the probability of just 50% or two of them landing heads up. This is because there are six ways for only two coins to land heads up, while there is only one way for all four coins to land heads up. So despite microscopically each coin having a 50% chance of landing heads up, we can see macroscopically that the ensemble is very unlikely to have all four coins land heads up. If we extend this thought experiment to ten coins, then 50% of the coins landing heads up is 252 times more likely than 100%. If we extend this to 50 coins, then this factor becomes over 126,000,000,000,000 times more likely. What happens if we extend this concept to billions of coins? Thermodynamics treats atoms as coins and studies the consequences of this phenomenon in physical systems, for example, if a drop of food coloring is placed into a glass of water, the food coloring spreads out to eventually create a uniform color in the glass. Why is this? The uniform color is a result of the atoms of the food coloring spreading out over time. There are many more ways for the billions of atoms to be in different places than all localized in one drop. For the drop not to spread out would kind of be like dropping a billion coins on the ground and having them all land heads up. Let's formalize this a little bit more with the concept of a random walk. In a random walk, a particle starts at zero, and at each time step, it takes a step, either left or right with equal probability. Over time, the particle will move about randomly. We can look at the probability of this particle landing in different spots at different times. If we iterate this over four time steps, then the furthest the particle can be from the start is four, either plus four or minus four. But this case is pretty unlikely. For the particle to get all the way to four, it would have to take a right step at each time step. This is just like flipping four coins and having them all land heads up. As we saw, the 50 50 state is most likely, and this corresponds to an equal amount of lefts and rights in a random walk. So this means for one particle, it will probably end up around where it started. However, the potential distance that this particle can move only grows over time. So while an individual particle is likely to end up near where it started, if we do this with a bunch of particles simultaneously, we will see the final distribution of particles approach Gaussian. As time goes on, getting wider and wider with time, so simply through random motion, it becomes overwhelmingly likely that a group of particles will spread out over time, just like how food coloring spreads out over time. This process is called diffusion, and it's what inspires models like dolly and stable diffusion. Just as thermodynamics views atoms as coins, diffusion models view pixels in an image as atoms. In this schema, we effectively let the pixel values in an image go on random walks. Similarly to how the random motion of food coloring will always lead to a uniform color, under some conditions, the random motion of pixels will always lead to TV static, which can be thought of as the image equivalent of uniform food coloring. By learning how the atoms diffuse for a particular drop, we can select a random atom in the uniform coloring and then go back in time to figure out where it started in the initial drop. Similarly, by training on images and learning how they diffuse, we can select a random image of Gaussian noise and go back in time to figure out where it started in the data distribution, I. E. Generate a novel image just like how we observed electrostatics maps complicated data distributions into a simple circular distribution. In PFGMs here we observed that thermodynamics maps complicated distributions into Gaussian noise. We then use the same principles to sample and map rather than try to sample from the data distribution directly. If you want to take a closer look at how this works mathematically, feel free to check out the introduction to diffusion models on our blog. All right, I hope this video gave you an idea of how physics is inspiring these state of the art generative AI models. If you have any questions, feel free to leave them in the comments below. See you in the next video. A lot has happened in 2023 in the AI world. Here is a quick recap of the most prominent developments that happened in 2023 you should know about before 2024 starts. Let's get started.


By Matt Makai. 2021-2024.