OpenAI Embeddings Explained: Semantic Search, RAG & Vector Databases (2026 Guide)

Vector embeddings have revolutionized how software systems understand human language. In this 2026 guide, we cover how to get and optimize OpenAI embeddings (V3) for semantic search, retrieval-augmented generation (RAG), and vector databases.

To understand how to merge vector searches with keyword lookups, read our Complete RAG Hybrid Search Guide. Or, scale up your systems with our step-by-step LangGraph Multi-Agent Tutorial.

Vector Embeddings & RAG Ingestion Pipeline

1. What are OpenAI Embeddings?

OpenAI’s text embeddings measure the relatedness of text strings. In simple terms, they translate natural language into list coordinates (vectors of floating-point numbers) representing semantic concepts.

The distance between two vectors measures their semantic similarity. When vectors are close together in vector space, they represent high similarity. When they are far apart, they represent low similarity.

2. How Embeddings Work

To get an embedding, send your text string to the embeddings API endpoint along with the model ID (e.g. text-embedding-3-small). The API outputs a normalized vector representing the text:

import OpenAI from "openai";
const openai = new OpenAI();

const embedding = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "Your text string goes here",
  encoding_format: "float",
});

console.log(embedding.data[0].embedding);
// Output: [-0.006929, -0.005336, -0.000045, ...]

For third-generation embedding models, use the cl100k_base token encoding. Use a local library like tiktoken to calculate token lengths before making API calls to stay within rate limits and optimize costs.

3. Vector Database Integration

To perform search at scale across millions of vectors, you need a dedicated **vector database**. Popular choices include:

Pinecone: Managed cloud-native vector database designed for high performance.
ChromaDB: Lightweight, open-source embedded vector database for fast prototyping.
Weaviate: Open-source vector search engine with hybrid search support.
pgvector: PostgreSQL extension for indexing and searching vector data directly inside SQL.

4. Embeddings vs Fine-Tuning vs RAG

It is important to select the correct approach when building LLM applications. Here is how they compare:

Method	Primary Use Case	Focus
Embeddings	Semantic Similarity & Clustering	Understanding text relatedness
Fine-Tuning	Model Style, Tone, or Format	Customizing base model behavior
RAG	Dynamic Knowledge Injection	Contextual document retrieval

5. Real Code Example: Calculating Similarity

Because OpenAI embeddings are normalized to length 1, calculating cosine similarity is simplified to a simple dot product. Let's compare "Apple iPhone" and "iPhone by Apple" in Python:

import numpy as np
from openai import OpenAI

client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
    return client.embeddings.create(input=[text], model=model).data[0].embedding

# Generate embeddings
emb1 = np.array(get_embedding("Apple iPhone"))
emb2 = np.array(get_embedding("iPhone by Apple"))

# Calculate cosine similarity using dot product
similarity = np.dot(emb1, emb2)
print(f"Semantic Similarity: {similarity:.4f}")
# Output: Semantic Similarity: 0.9412

6. Reducing Embedding Dimensions

A key advantage of OpenAI's V3 models is the ability to truncate the vector length using the dimensions parameter without sacrificing performance. This reduces index storage requirements in your vector databases by up to 66%, boosting cost efficiency.

7. Frequently Asked Questions (FAQs)

What are OpenAI embeddings?

OpenAI embeddings are lists of floating-point numbers (vectors) generated by models like text-embedding-3-small representing the semantic meaning of text strings. They are used to compare the similarity of different pieces of text.

How do OpenAI embeddings work?

They work by passing text into an embedding model API, which processes the tokens and projects the text into a multi-dimensional coordinate space. Similarity is calculated by measuring the mathematical distance (usually Cosine Similarity) between these coordinates.

Which OpenAI embedding model is best?

For most use cases, text-embedding-3-small is best. It offers the best balance between speed, high performance (62.3% on MTEB evaluation), and cost (62,500 pages per dollar). Use text-embedding-3-large for high-precision workflows requiring maximum dimensions (3072).

What is the difference between embeddings and fine-tuning?

Embeddings provide search indexing to retrieve relevant external documents. Fine-tuning adjusts the tone, formatting, or behavior of a base model itself. Fine-tuning does not inject new, external documents into a model's knowledge base.

How much do OpenAI embeddings cost?

They are priced per input token. For text-embedding-3-small, it costs $0.00002 per 1k tokens, which averages to about 62,500 pages of text per US dollar, making it extremely cost-effective for large-scale operations.

OpenAI Embeddings Explained: Semantic Search, RAG & Vector Databases