RAG

An article containing my findings and example code for RAGs and similar concepts. Before talking about RAGs, we need to understand what Embedding, Vector and Vector Databases mean.

What is Embedding?

Embedding is a means of representing objects like text, images and audio as points in a continuous vector space where the locations of those points in space are semantically meaningful to machine learning algorithms. Embedding is a critical tool for building text and image search engines, recommendation systems, chatbots, fraud detection systems and many other applications. In essence, embedding enables machine learning models to find similar objects using simple vector mathematics.

Most machine learning algorithms can only take low-dimensional numerical data as inputs. Therefore, it is necessary to convert the data into a numerical format. Objects that come into an embedding model are output as embeddings, represented as vectors. A vector is an array of numbers (e.g. 1489, 22… 3, 777), where each number indicates where an object is along a specified dimension. The number of dimensions can reach a thousand or more depending on the input data’s complexity. The smallest dimension generally used is 384 dims because one of the most popular models, all-MiniLM-L6-v2, outputs 384 dimensional embeddings.

The closer an embedding is to other embeddings in this n-dimensional space, the more similar they are. Distribution similarity is determined by the length of the vector points from one object to the other (measured by cosine similarity, vector distance formula, etc.). The higher the dot-product of the two vectors, the more similar they are.

Benefits of Embeddings

→ Semantic Representation
→ Dimensionality Reduction
→ Improved generalization of models
→ Efficient and clearer visualizations
→ Efficient training in neural networks

What are Vectors?

In mathematics, a vector is a quantity possessing both magnitude (size) and direction. In AI computation and related fields, vectors are a subset of tensors, which in machine learning (ML) is a generic term for a group of numbers or a grouping of groups of numbers—in n-dimensional space. Tensors function as a mathematical bookkeeping device for data. Working up from the smallest element: - A scalar is a zero-dimensional tensor, containing a single number. For example, a system modeling weather data might represent a single day’s high temperature (in Fahrenheit) in scalar form as 85. - Then, a vector is a one-dimensional (or first-degree or first-order) tensor, containing multiple scalars of the same type of data. For example, a weather model might use the low, mean and high temperatures for a single day in vector form: 62, 77, 85. Each scalar component is a feature—that is, a dimension—of the vector, representing a feature of that day’s weather.

Vector numbers can represent complex objects such as words, images, videos and audio generated by an ML model. This high-dimensional vector data, containing multiple features, is essential to machine learning, natural language processing (NLP) and other AI tasks.

What are Vector Databases?

A vector database stores, manages and indexes high-dimensional vector data. Data points are stored as arrays of numbers called “vectors,” which are clustered based on similarity. This design enables low-latency queries, making it ideal for AI applications. Unlike traditional relational databases with rows and columns, data points in a vector database are represented by vectors with a fixed number of dimensions. Because they use high-dimensional vector embeddings, vector databases are better able to handle unstructured datasets.

Relational databases excel at managing structured and semistructured datasets in specific formats. Loading unstructured data sources into a traditional relational database to store, manage and prepare the data for AI is a labor-intensive process, especially with new generative use cases such as similarity search. Traditional search typically represents data by using discrete tokens or features, such as keywords, tags or metadata. Traditional searches rely on exact matches to retrieve relevant results. For example, a search for "smartphone" would return results containing the word "smartphone."

Opposed to this, vector search represents data as dense vectors, which are vectors with most or all elements being nonzero. Vectors are represented in a continuous vector space, the mathematical space in which data is represented as vectors. Vector representations enable similarity search. For example, a vector search for “smartphone” might also return results for “cellphone” and “mobile devices.” Each dimension of the dense vector corresponds to a latent feature or aspect of the data. A latent feature is an underlying characteristic or attribute that is not directly observed but inferred from the data through mathematical models or algorithms. Latent features capture the hidden patterns and relationships in the data, enabling more meaningful and accurate representations of items as vectors in a high-dimensional space. Vector databases serve three key functions in AI and ML applications, Vector storage, vector indexing and similarity search based on querying or prompting.

Benefits of Vector Databases

→ Speed and performance
→ Scalability
→ Lower cost of ownership
→ Data management
→ Flexibility

What is Retreival Augmented Generation (RAG)?

Retrieval augmented generation (RAG) is an architecture for optimizing the performance of an artificial intelligence (AI) model by connecting it with external knowledge bases. RAG helps large language models (LLMs) deliver more relevant responses at a higher quality by feeding it relevant contextual information that it doesn't yet know of. Generative AI models are trained on large datasets and refer to this information to generate outputs. However, training datasets are finite and limited to the information it was trained on. The developer can access—public domain works, internet articles, social media content and other publicly accessible data but that's the limit of how much data it can get legally & ethically to train the model. RAGs allow generative AI models to access additional external knowledge bases, such as internal organizational data, scholarly journals and specialized datasets. Gen AI models have a knowledge cutoff, the point at which their training data was last updated. As a model ages further past its knowledge cutoff, it loses relevance over time. RAG systems connect models with supplemental external data in real-time and incorporate up-to-date information into generated responses.

Enterprises use RAG to equip models with specific information such as proprietary customer data, authoritative research and other relevant documents. RAG models can also connect to the internet with APIs and gain access to real-time social media feeds and consumer reviews for a better understanding of market sentiment. Meanwhile, access to breaking news and search engines can lead to more accurate responses as models incorporate the retrieved information into the text-generation process. RAG anchors LLMs in specific knowledge backed by factual, authoritative and current data. Compared to a generative model operating only on its training data, RAG models tend to provide more accurate answers within the contexts of their external data. While RAG can reduce the risk of hallucinations, it cannot make a model error-proof.

Benefits of RAG

→ Cost-efficient AI implementation and AI scaling
→ Access to current domain-specific data
→ Lower risk of AI hallucinations
→ Increased user trust
→ Expanded use cases
→ Enhanced developer control and model maintenance
→ Greater data security
→ Increased flexibility to fine tune AI response according to user
→ Increased contextualized response

Bibliography

→ What is a Vector Database?
→ What is Embedding?
→ What is RAG (retrieval augmented generation)?

Comments

No comments yet. Be the first to comment!

Introduction to RAG Pipeline

RAG