Cosine similarity is a fundamental measure used to determine the similarity between two vectors by calculating the cosine of the angle between them. This metric is widely used in machine learning, data mining, information retrieval, and natural language processing to quantify how similar two data points are in terms of their direction in multi-dimensional space.
The cosine similarity formula is: cos(θ) = (A · B) / (|A| × |B|), where A · B is the dot product of vectors A and B, and |A| and |B| are their respective magnitudes. This formula produces a value between -1 and 1, where 1 indicates identical direction, 0 indicates orthogonality, and -1 indicates opposite directions.
Mathematical Foundation
The mathematical foundation of cosine similarity lies in the geometric relationship between vectors in n-dimensional space. Unlike Euclidean distance, which measures the magnitude of difference between vectors, cosine similarity focuses solely on the angle between them, making it independent of vector magnitude.
This property makes cosine similarity particularly valuable when comparing data where magnitude differences are less important than directional similarities. For example, in text analysis, two documents might have different lengths but similar topics, making cosine similarity more appropriate than distance-based measures.
Range and Interpretation
Cosine similarity values range from -1 to 1: A value of 1 means vectors point in exactly the same direction (identical orientation), 0 indicates perpendicular vectors (no correlation), and -1 represents vectors pointing in completely opposite directions. Values closer to 1 indicate higher similarity, while values closer to -1 indicate greater dissimilarity.