Member-only story

A Simple Algorithm To Compute How Similar Strings Are In Python

5 min readNov 19, 2021

In Python, we are able to test if 2 strings are an exact match using the boolean equals operator ==:

print("apple" == "apple") # True
print("apple" == "orange") # False

However, things get more complicated if we wish to find a similar string that’s close enough:

print("apple" == "apples") # False
print("orange" == "ornage") # False
print("pear" == "paer") # False

As humans, we can tell that "apple" and "apples" are very similar, "orange" and "ornage" are very similar (likely a typo) and the same case with "pear" and "paer"(another typo). Unfortunately, Python recognises them as completely different strings.

words = ["apple", "orange", "pear", "apples", "snapple", "zzzzz"]

Given a list of words, we want to somehow be able to check which words are closer to one another (we’re expecting "apple", "apples" and "snapple" to be pretty close!) In this article, we’ll explore a simple method to do this.

Cosine Similarity Formula

cos(a, b) = dot(a, b) / magnitude(a) / magnitude(b)

a and b are vectors, and dot refers to the dot product, while magnitude refers to the magnitude of the vector. Don’t worry if these sound foreign to…

A Simple Algorithm To Compute How Similar Strings Are In Python

Cosine Similarity Formula

Written by Liu Zuo Lin

No responses yet