Vector Search : Revealing ML Science behind a new era of the search engines

Organisations like MongoDB are introducing the world to a new era of faster information retrieval by leveraging Machine Learning techniques to map users requirements in a 3D space of vectors to semantically match the context. Let's explore the underlying concept of this new search method widely being adopted in the industry.


Driven Technophile 🚀 Passionate Developer 🙋🏻‍♀️ Creative Visionary ✨

May 03, 2023


Find more articles like this
Machine LearningDeep LearningNLPNeural NetworksComputer VisionLLMPALM2Neural NetworksTransformerAutoencodersData ScienceAlgorithm
Top Highlights Today!

Breaking Barriers: Democratizing Access to Vector Databases

Large language models (LLMs) and AI-related technologies are on everyone’s lips. Vector databases, the critical infrastructure for LLMs and AI applications, have gained widespread attention from a broader user base, expanding from algorithm engineers to include application and backend developers.

Understanding Vectors 🎼

Beginner level : ⭐️ ⭐️ ⭐️

In the world of Machine Learning, vectors play a crucial role in representing data and capturing its underlying patterns. These numeric representations encode information and context, enabling algorithms to perform complex tasks such as clustering and similarity analysis. In this article, we will explore the fascinating realm of vectors and delve into the various similarity functions used to measure the closeness of data points.

The Foundation Vectors and Clustering Models

At the core of many clustering models, such as the popular K-nearest neighbours (KNN), lie vectors. Vectors are arrays of numeric values that succinctly represent data points.

For instance, let's consider the sentence "She lives in New York City." This sentence can be transformed into a vector representation like [2.3, 0.11, 4.35, 0.771, ...]. These vectors act as the fundamental data requirements for clustering models.

The Magic of Vector Space

When vectors are plotted in a high-dimensional space, an intriguing phenomenon occurs—similar vectors tend to cluster together. This clustering allows us to identify semantically similar data points and uncover hidden patterns. The concept of vector space forms the foundation for various machine learning techniques.

Understanding Similarity Functions

To measure the similarity between vectors, we employ similarity functions. Different distance functions capture distinct aspects of similarity. Let's explore three commonly used similarity functions:

Euclidean Distance:

Mapping Similarity in Images Euclidean distance measures the straight-line distance between the ends of vectors. It is suitable for dense data where the magnitude of values matters. Consider an image recognition task where each image is represented by a vector. The Euclidean distance can help identify images with similar visual characteristics, aiding in tasks like image clustering and classification.

Example: Suppose we have two images represented by vectors [1.2, 3.4, 2.1] and [0.8, 2.9, 2.5]. Computing the Euclidean distance between them reveals their similarity or dissimilarity.

Cosine Similarity

Capturing Text Context and Themes Cosine similarity calculates the angle between vectors, focusing on their orientation. It is well-suited for sparse data, such as text data, where the presence or absence of features matters. By measuring the cosine similarity, we can identify text documents with similar themes or contextual relevance.

Example: Consider two text documents represented as vectors [0.2, 0.9, 0.5] and [0.6, 0.3, 0.8]. Computing the cosine similarity between these vectors reveals their degree of similarity.

Dot Product

Balancing Orientation and Intensity The dot product measures the angle between vectors while considering their magnitudes. It is particularly useful for sparse data where both the orientation and intensity of features are crucial. By utilizing the dot product, we can capture the relationship between vectors in a comprehensive manner.

Example: Let's take two vectors [0.3, 0.8, 0.1] and [0.7, 0.4, 0.6]. Computing their dot product showcases how the orientation and intensity contribute to their similarity.

Exploring Applications

Vectors and similarity functions find applications in various domains. Here are a few notable examples:

  • Recommender Systems: By analyzing the similarity between user preferences, products, or content, recommender systems suggest personalized recommendations.
  • Document Clustering: Vectors and similarity functions help group similar documents, enabling tasks like topic modeling and document retrieval.
  • Anomaly Detection: By measuring the dissimilarity of data points