Diagramming System Design: Rate Limiters
All production-grade services rely on rate limiters - when a user may post at most five articles...
ChatGPT reached one million users in the first five days of product release. Needless to say, generative AI and associated tools are on top of everyone's mind.
In this article, we’ll demystify generative AI for software engineers who are newer to artificial intelligence and machine learning:
According to IBM, “Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.” That’s a mouthful. Let's unpack that definition piece by piece.
Hierarchy of the related topics. Source: Author
Let's start at the top of the hierarchy. Computer science (CS) is the field that studies how computers and computations can be used to solve problems efficiently.
Artificial intelligence (AI) is a discipline of CS that focuses on creating computer models that “imitate intelligent human behavior ... [enabling computers] to perform complex tasks in a way that is similar to how humans solve problems.”
Machine learning (ML) is a branch of AI. It involves creating computer models “that give computers the ability to learn ... [and] to program themselves through experience.”
Deep learning is a sub-branch of ML, built on top of neural networks (more on that later).
As the word generative suggests, generative AI models create new content based on the input prompt and training data. There are many types of input and output content, but the major categories are language, visual, auditory, and data. Some generative AI models are multimodal, with the capability to create content across multiple types of media – for example, from text to images or text to videos. One reason for generative AI’s popularity is its widespread applicability.
Generative AI use cases by the major categories of language, visual, auditory, and data. Source: Author, inspired by https://www.nvidia.com/en-us/glossary/data-science/generative-ai/
A model, in machine learning, refers to a set of algorithms trained on a specific dataset.
As of now, various generative AI models have been developed and funded by big companies and universities, such as OpenAI, Nvidia, Google, Meta, and UC Berkeley. Each institution decides whether to keep its model private or public. They also determine the methods of access and monetization.
OpenAI has developed generative AI models including GPT-4, Whisper, and DALL-E 2. While the models are private, the company offers API access to these models with a pricing structure. OpenAI also offers ChatGPT, a chat application built on its GPT models, as a freemium product. Additionally, Microsoft has partnered with OpenAI on Bing Image Creator, which is powered by DALL-E.
Neural networks are a subset of machine learning loosely modeled on circuits of neurons in the human brain. A neural network can be visually represented as a fully interconnected, directed graph structure organized by layers.
There are several key components in a neural network:
A diagram of a neuron and its components. Source: https://medium.com/@tiffanytha.sj/introduction-to-neural-networks-b56d8148bae2
A visual representation of the layers in a neural network. Source: https://www.ibm.com/topics/neural-networks
A backpropagation neural network diagram. Source: https://ebrary.net/190741/engineering/propagation
Initially, all of a neural network’s weights and thresholds are set to random values. As training data is successively fed through, the weights and thresholds are continuously adjusted automatically until the neural network produces expected outputs based on provided inputs. This process requires trial and error over time, similar to the human learning process.
Let's say we want to train a neural network to recognize handwritten numerical digits (0-9):
Instead of the traditional programming approach of specifying steps to find the output, the computer is given the input and expected output and allowed to figure out the details on how to get from point A to point B. The what is specified and the how is delegated.
The main differentiator in learning modes is the use of labeled datasets for training. Labeled datasets allow for high-accuracy training, but are intensive in time and labor because the labeling tasks are manually performed by humans.
Introduced in 2013, VAEs involve two neural networks: encoder and decoder. VAEs also have a third component in the middle known as the latent space.
The encoder converts input training data points into compressed representations, preserving critical information while discarding anything unnecessary. The compression process reduces the dimensionality of the dataset, allowing VAEs to deal with smaller and less complex data points while preserving relevancy.
The latent space holds all the compressed representations, also called latent representations. A latent representation is composed of latent attributes, each expressed as a probability distribution. The latent space acts as a bottleneck to distill the information from the encoder to the decoder while establishing useful correlations between the multitude of inputs. Creating latent representations prevents the VAEs from memorizing the training dataset and overfitting on it. Overfitting occurs when a model fits exactly or too closely to its training dataset, hindering it from generalizing or predicting with new unseen inputs.
The decoder reconstructs the original input data points from the compressed representations during training.
Novel content can be created by running new samplings of the latent representations through the decoder. The generation process of VAE models is typically faster than other model types but results in lower-quality output. VAEs have been typically applied in the use cases of synthetic data generation (for example, time series data such as music) and image reconstruction.
A VAE model diagram with latent space and attributes. Source: https://www.v7labs.com/blog/autoencoders-guide
Introduced in 2014, GANs pit two neural networks against each other: generator and discriminator. The generator creates new samples and the discriminator learns to distinguish whether the new samples are real or fake from existing data points. The generator and discriminator work together to continuously improve the new samples created until the discriminator can’t tell the difference between real existing data and fake new data.
Novel content can be created by the generator. The generation process of GANs is generally fast with high-quality output compared with other model types. The downside is weak sample diversity, meaning these models work better for specific domains rather than generalized use cases. Due to their capability for fast, high-quality output, GANs have been widely used in generating multimedia, including voices and images.
A simple illustration of how the generator and discriminator of a GAN model work together. Source: Author, inspired by https://developers.google.com/machine-learning/gan/gan_structure
Introduced in 2015, DDPMs involve a two-step training process: forward diffusion and reverse diffusion. The first step is forward diffusion which slowly adds random noise to the input training data points. The second step is reverse diffusion which slowly removes the noise from the output training data points to reconstruct the input samples.
Novel content can be created by running the reverse diffusion process with completely random noise. The training and generation process for DDPMs is time-consuming, but this model type generally creates the highest-quality output with strong sample diversity, being flexible for most use cases. Due to the detailed output, one common use case for diffusion models is generating superior images.
An illustration of forward and reverse diffusion in a DDPM model. Source: https://www.nvidia.com/en-us/glossary/data-science/generative-ai/
Introduced in 2017, the transformer machine learning architecture makes it possible to train models with a huge amount of training data without labeling them in advance. The reason is that transformers can process sequential input data in a non-sequential manner. Transformers combined the encoder-decoder setup with the concept of “attention.”
The word it is correctly referenced to the animal, which has the most attention. Source: https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
The word it is correctly referenced to the street, which has the most attention. Source: https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Transformers can be “pre-trained” for generalized use cases with a massive amount of unlabeled training data. Later, a transformer can be specialized by fine-tuning it with a small amount of domain- or task-specific labeled training data. Such versatility makes transformers useful foundation models.
Large Language Models (LLMs) are language-based transformers with billions or even trillions of parameters allowing for advanced natural language processing. Parameters, in machine learning, are the variables present in the model during training, resembling a “model’s knowledge bank.”
There are three classes of LLMs:
The GPT-n generative AI models have been created by OpenAI. At the time of writing (July 2023), the latest released version is GPT-4, a multimodal decoder-only LLM that is capable of accepting image and text inputs and generating text outputs. It can perform advanced reasoning tasks, including code and math generation. Its predecessor, GPT-3, was pre-trained with 175 billion parameters.
ChatGPT is a chat application based on these GPT models with a focus on creating conversational responses. DALL-E is a text-to-image generative AI model based on a subset of GPT-3 and combined initially with a GAN model and later with a diffusion model.
Despite the advancements made by generative AI models, there are still limitations and challenges.
The generative AI models and capabilities we know today wouldn’t have been possible without previous advancements in the underlying machine learning technologies and deep learning models – namely, neural networks, unsupervised and semi-supervised learning modes, and VAEs, GANs, and DDPMs. The introduction of the transformer architecture paved the way for the development of modern LLMs that provide the groundwork for generative AI models. While we can use generative AI services and applications without understanding their underlying technologies and models, having this technical knowledge helps us anticipate new advances and choose the best tool for our specific needs.
Diana Cheung (ex-LinkedIn software engineer, USC MBA, and Codesmith alum) is a technical writer on technology and business. She is an avid learner and has a soft spot for tea and meows.
All production-grade services rely on rate limiters - when a user may post at most five articles...
We sat down with Codesmith alum Jenna Davis to discuss her path to Codesmith, the software...
Before attending Codesmith, Daniel King was a high school teacher. Now, he works as an Engineering...