Unlike searching for a single character, searching for a complete string could be challenging.

If we search string `ION`

in `DICTIONARY`

. For each character in `DICTIONARY`

, we have to look for `I`

, `O`

, and `N`

in the same order.

Throughout this article, I’ll be referring to the two strings as the following

**search string**: The string upon which the search is executed. Example:`DICTIONARY`

**input string**: The string being searched. Example:`ION`

A brute force solution of substring search will loop over characters in the search string and for each character, it will perform another loop over characters in the input string to check for equality.

```
loop(index1 in search_string)
matchFlag = true
loop(index2 in input_string)
if search_string[index1 + index2] != input_string[index2]
matchFlag = false
break
if matchFlag == true
return True
return False
```

The best performance scenario for this algorithm will be when we have to search a string within itself.

Since the outer loop (over search string) is executed only once the resulting time complexity is $O(m)$, where $m$ is the size of the input string.

The result will also be the same if the search string has characters after `ION`

(like `IONIZED`

, `IONIC`

, etc.) because the algorithm will exit on the first match.

The worst-case scenario in terms of performance will be when the input string is not present in the search string.

The algorithm will loop over all characters in the search string and for each character, it will also loop over every character in the input string. The worst-case time complexity will be $O((n-m) m)$ where $n$ and $m$ are lengths of the search string and input string respectively.

```
package main
import "fmt"
func bruteForceSubstringSearch(inputString string,
searchString string)(bool){
// Looping over characters in searchString, time complexity: O(n)
for indexN:=0;indexN<len(searchString)-len(inputString);indexN++{
matchFlag := true
// Looping over characters in inputString, time complexity: O(m)
for indexM:=0;indexM<len(inputString);indexM++{
if searchString[indexN+indexM] != inputString[indexM]{
matchFlag = false
}
}
// Exit function if a match is found
if matchFlag == true{
return true
}
}
return false
}
func main(){
searchString := "DICTIONARY"
inputString := "ION"
fmt.Println("String",
inputString,
"is present in",
searchString, ":",
bruteForceSubstringSearch(inputString, searchString))
searchString = "FOOTBALL"
fmt.Println("String",
inputString,
"is present in",
searchString, ":",
bruteForceSubstringSearch(inputString, searchString))
}
// Output
// String ION is present in DICTIONARY : true
// String ION is present in FOOTBALL : false
```

The major contributor to the time complexity of brute-force solution is the recurring inner loop ($O(m)$). If we can somehow reduce it to constant time, then the total time complexity of the program will be $O(n)$.

The Rabin-Karp substring search algorithm was presented by Michael O. Rabin and Richard M. Karp in their research paper Efficient randomized pattern-matching algorithms published in March 1987.

In this algorithm, we calculate a *rolling* hash for all characters in the search string like the following

then we use the same hashing function on the input string.

The hash value of the input string is searched within rolling hashes of the search string. If a match is found then the characters of both strings are compared to confirm the match, as different strings could have the same hash value due to hash collisions.

The hash function could be defined like following

$$hash(input) = code(input[0])*128^{(n-1)} + code(input[1])*128^{(n-2)}+ \dots + code(input[n-1])*128^{0}$$

- $input$ is a set of characters for which the hash value is going to be calculated.
- $code()$ function converts characters to their ASCII values.
- $n$ is the length of $input$.
- The base is selected as $128$ to prevent hash collisions.

Applying this hash function on string `ION`

, we get

$$hash(ION) = code(I)*128^{2} + code(O)*128^{1} + code(N)*128^{0}$$ $$hash(ION) = 73 \times 16384 + 79 \times 128 + 78 \times 1$$ $$hash(ION) = 1206222$$

Using arithmetic we can save the computation cost of calculating rolling hashes.

We can calculate the hash of the first 3 characters (because the size of the search string `ION`

is 3).

$$ hash(DIC) = code(D)*128^2 + code(I)*128^1 + code(C)*128^0 $$

We can derive the value of the next rolling hash, $hash(ICT)$ from the hash of the first three characters, $hash(DIC)$ by performing the following operations

$$ hash(ICT) = code(I)*128^2 + code(C)*128^1 + code(T)*128^0 $$ $$ hash(ICT) = (code(I)*128^1 + code(C)*128^0)*128 + code(T)*128^0 $$ $$ hash(ICT) = (hash(DIC) - code(D)*128)*128 + code(T)*128^0 $$

We can repeat this operation to calculate all rolling hash values.

```
loop (index in M)
if rollingHash[index] == rollingHash(N):
if match(M[index:index+len(N)], N):
return True
return False
```

The best case scenario for the Rabin-Karp algorithm is the same as the brute-force algorithm i.e. the input string is the same as the search string or present at the start of the search string.

Since we are going to match the string only once the time complexity of the best case will be $O(n+m)$ where $n$ and $m$ are the sizes of the input string and search string respectively.

The worst-case search string will have multiple substrings with the same rolling hash value as the input string while also having different characters i.e. we have to match characters for almost every character in the search string.

This is the same as running the brute-force algorithm. So the worst case time complexity is $O((n-m)n)$.

We can start with the implementation of the `calculateHash()`

function that takes a string and a base value (`2`

, `4`

, `8`

, etc.) as input and returns a hash value as output.

```
func calculateHash(inputString string, base int)(int){
hashValue := 0
// The time complexity of this function is O(m)
// where m is the size of inputString
for i:=0;i<len(inputString);i++{
multiple := math.Pow(float64(base), float64(len(inputString)-i-1))
hashValue += int(inputString[i])*int(multiple)
}
return hashValue
}
```

Now using this function we can also implement a `calculateRollingHash()`

function to use on the search string.

```
func calculateRollingHash(searchString string, lenInputString int, base int)([]int){
tempHash := calculateHash(searchString[:lenInputString], base)
var rollingHashes []int
rollingHashes = append(rollingHashes, tempHash)
// The time complexity of this function is O(n)
// where n is the size of searchString
for i:=1;i<len(searchString)-lenInputString+1;i++{
removedChar := searchString[i-1]
addedChar := searchString[i-1+lenInputString]
// Reusing tempHash to calculate values of subsequent hashes
tempHash = tempHash - int(removedChar)*int(math.Pow(float64(base),
float64(lenInputString-1)))
tempHash = tempHash*base + int(addedChar)
rollingHashes = append(rollingHashes, tempHash)
}
return rollingHashes
}
```

A `match()`

function for verifying the character match between the input string and search string.

```
func match(string1 string, string2 string)(bool){
// Matching characters in string1 to characters in string2
// Length of both strings is assumed to be equal
for i:=0;i<len(string1);i++{
if string1[i]!=string2[i]{
return false
}
}
return true
}
```

Using the helper functions implemented above we can finally implement the `rabinKarpSearch()`

function that returns a boolean value representing the presence of an input string in the search string.

```
func rabinKarpSearch(inputString string, searchString string, base int)(bool){
// Calculate all rolling hashes for searchString
rollingHashes := calculateRollingHash(searchString, len(inputString), base)
// Calculate the hash value of inputString
inputStringHash := calculateHash(inputString, base)
for i:=0;i<len(rollingHashes);i++{
matchFlag := false
if inputStringHash == rollingHashes[i]{
// Match both strings using characters
matchFlag = match(inputString, searchString[i:i+len(inputString)])
}
if matchFlag == true{
// Exit function on the first match
return true
}
}
return false
}
```

```
package main
import (
"fmt"
"math"
)
func calculateHash(inputString string, base int)(int){
hashValue := 0
// The time complexity of this function is O(m)
// where m is the size of inputString
for i:=0;i<len(inputString);i++{
multiple := math.Pow(float64(base), float64(len(inputString)-i-1))
hashValue += int(inputString[i])*int(multiple)
}
return hashValue
}
func calculateRollingHash(searchString string, lenInputString int, base int)([]int){
tempHash := calculateHash(searchString[:lenInputString], base)
var rollingHashes []int
rollingHashes = append(rollingHashes, tempHash)
// The time complexity of this function is O(n)
// where n is the size of searchString
for i:=1;i<len(searchString)-lenInputString+1;i++{
removedChar := searchString[i-1]
addedChar := searchString[i-1+lenInputString]
// Reusing tempHash to calculate values of subsequent hashes
tempHash = tempHash - int(removedChar)*int(math.Pow(float64(base),
float64(lenInputString-1)))
tempHash = tempHash*base + int(addedChar)
rollingHashes = append(rollingHashes, tempHash)
}
return rollingHashes
}
func match(string1 string, string2 string)(bool){
// Matching characters in string1 to characters in string2
// Length of both strings is assumed to be equal
for i:=0;i<len(string1);i++{
if string1[i]!=string2[i]{
return false
}
}
return true
}
func rabinKarpSearch(inputString string, searchString string, base int)(bool){
// Calculate all rolling hashes for searchString
rollingHashes := calculateRollingHash(searchString, len(inputString), base)
// Calculate the hash value of inputString
inputStringHash := calculateHash(inputString, base)
for i:=0;i<len(rollingHashes);i++{
matchFlag := false
if inputStringHash == rollingHashes[i]{
// Match both strings using characters
matchFlag = match(inputString, searchString[i:i+len(inputString)])
}
if matchFlag == true{
// Exit function on the first match
return true
}
}
return false
}
func main(){
inputString := "ION"
searchString := "DICTIONARY"
fmt.Println("rabinKarpSearch",
inputString,
"in",
searchString,
"result:",
rabinKarpSearch(inputString, searchString, 2))
searchString = "FOOTBALL"
fmt.Println("rabinKarpSearch",
inputString,
"in",
searchString,
"result:",
rabinKarpSearch(inputString, searchString, 2))
searchString = "UNION"
fmt.Println("rabinKarpSearch",
inputString,
"in",
searchString,
"result:",
rabinKarpSearch(inputString, searchString, 2))
searchString = "IONIC"
fmt.Println("rabinKarpSearch",
inputString,
"in",
searchString,
"result:",
rabinKarpSearch(inputString, searchString, 2))
searchString = "ION"
fmt.Println("rabinKarpSearch",
inputString,
"in",
searchString,
"result:",
rabinKarpSearch(inputString, searchString, 2))
}
// Output
// rabinKarpSearch ION in DICTIONARY result: true
// rabinKarpSearch ION in FOOTBALL result: false
// rabinKarpSearch ION in UNION result: true
// rabinKarpSearch ION in IONIC result: true
// rabinKarpSearch ION in ION result: true
```

Thank you for taking the time to read this blog post! If you found this content valuable and would like to stay updated with my latest posts consider subscribing to my RSS Feed.

Efficient randomized pattern-matching algorithms

Michael O. Rabin

Richard M. Karp

**Time Complexity** ($T(n)$) is a function that estimates the execution time of an algorithm given the amount of data to be processed as its input. It is a common benchmark used to measure an algorithm’s performance.

The output of the time complexity function will be a close estimate of an algorithm’s runtime yet it does not consider other characteristics of the data that could affect runtime.

Some algorithms perform best when the input data is sorted in ascending order and worst when sorted in descending order. That’s why we have bounds on the time complexity function, a range starting from best-case to worst-case execution time for the same amount of input data.

The “big O” ($O$) represents the upper bound (worst case scenario) on the time complexity function i.e. for an input dataset of size $n$ the algorithm’s time complexity couldn’t be worse than $O(n)$.

The “big Omega” ($\Omega$) represents the lower bound (best case scenario) on the time complexity function i.e. for an input dataset of size $n$ the algorithm’s time complexity couldn’t be better than $\Omega(n)$.

The “big Theta” ($\Theta$) represents the case where both upper and lower bounds are at the same point (expected case scenario) i.e. for an input dataset of size $n$ the algorithm’s time complexity couldn’t get better or worse than $\Theta(n)$.

Big $O$ is the preferred time complexity function for an algorithm’s runtime analysis because it provides a conservative estimate and its result is independent of factors like hardware performance, characteristics of data, compiler optimization, etc.

The runtime of recurring patterns in programming could be represented by common time complexity functions. This helps us estimate the time complexity of the entire program.

Scaling an Algorithm with Constant Time Complexity

An algorithm has **constant time complexity** when its runtime isn’t affected by the amount of data passed as input. An example would be a function that performs addition on its two inputs.

```
package main
import "fmt"
func addition(x int, y int) (int){
// The size of x and y does not affect
// the runtime of this function
return x+y
}
func main(){
x := 2000
y := 2132
fmt.Println("Addition of", x, "and", y, "is:", addition(x, y))
}
// Result
// Addition of 2000 and 2132 is: 4132
```

The above example is implemented in the Go Programming Language.

Scaling an Algorithm with Linear Time Complexity

For some algorithms the execution time is directly proportional to the size of its input, such algorithms are categorized in **linear time complexity**.

An example would be a loop that iterates over elements in a list and returns its sum.

```
package main
import "fmt"
func arraySum(arr []int)(int){
sum := 0
// Time taken to complete this loop
// will be directly proportional to the
// size of arr
for _, element := range arr{
sum += element
}
return sum
}
func main(){
arrayExample := []int{1, 2, 3, 2, 1, 1}
fmt.Println("Sum of the array:", arrayExample, "will be",
arraySum(arrayExample))
}
// Output
// Sum of the array: [1 2 3 2 1 1] will be 10
```

If we call the function `arraySum()`

twice then the time complexity of the program will be $O(2n)$. However, we can generalize it to $O(n)$ because even though the program performs two passes over the array, the growth rate in runtime remains linear with respect to its input size.

Scaling an Algorithm with Quadratic Time Complexity

Similar to linear time complexity, algorithms exhibiting quadratic time complexity experience execution times that are directly proportional to the square of the number of inputs. These algorithms scale relatively slower (longer execution time) compared to linear time complexity algorithms like $O(n)$, $O(2n)$, etc.

For example, a program that displays pair combinations of all elements in an array using nested loops.

```
package main
import "fmt"
func showCombinations(arr []int){
// The inner loop is executed n (size of arr) times.
// Thus, the total time complexity of this function will be O(n*n)
for _, element1 := range arr{
// Time complexity of the inner loop
// is directly proportional to the size of arr i.e. O(n)
for _, element2 := range arr{
if(element1 != element2){
fmt.Println("Combination:", element1, "and",
element2)
}
}
}
}
func main(){
arrayExample := []int{123, 1234, 456, 5462}
fmt.Println("Pair combination of all elements in the array: ",
arrayExample, "are:")
showCombinations(arrayExample)
}
// Output
// Pair combination of all elements in the array: [123 1234 456 5462] are:
// Combination: 123 and 1234
// Combination: 123 and 456
// Combination: 123 and 5462
// Combination: 1234 and 123
// Combination: 1234 and 456
// Combination: 1234 and 5462
// Combination: 456 and 123
// Combination: 456 and 1234
// Combination: 456 and 5462
// Combination: 5462 and 123
// Combination: 5462 and 1234
// Combination: 5462 and 456
```

Scaling an Algorithm with Exponential Time Complexity

A brute-force algorithm to find the $n$th number in the Fibonacci series has exponential time complexity because it branches in two recursive calls on every iteration.

```
package main
import "fmt"
func fibonacci(n int) (int){
if n <= 1 {
return n
}
// The recursive calls will branch
// further in two more calls
return fibonacci(n-1) + fibonacci(n-2)
}
func main() {
n := 10
fmt.Println("The", n, "th Fibonacci number is:", fibonacci(n))
}
// Output
// The 10 th Fibonacci number is: 55
```

Algorithms with $O(2^n)$, $O(e^n)$, $O(10^n)$, etc. time complexities are grouped under **exponential** time. Among these, the $2^n$ function has widespread use in computer science like Moore’s law or to find the number of memory addresses possible with $n$ bits arrangement.

The number of transistors in an Integrated Circuit (IC) doubles about every two years

Gordon Moore, Co-Founder of Intel, 1965

Scaling an Algorithm with Logarithmic Time Complexity

The $\log_2{n}$ is the inverse of function $2^n$. The following example of a number guessing game has $O(\log_2{n})$ time complexity because it cuts the search space by half on each iteration.

```
package main
import "fmt"
func guessNumber(low int, high int)(int){
// Returns the middle number in a range from low to high
mid := (high+low)/2
return mid
}
func main(){
low := 0
high := 100
fmt.Println("Think of a number between", low, "and", high)
numQuestions := 0
for {
var answer1, answer2 int
guessedNumber := guessNumber(low, high)
fmt.Println("I guessed:", guessedNumber, "Is that correct?")
fmt.Println("1) Yes")
fmt.Println("2) No")
fmt.Print("Enter your response (1 or 2):")
fmt.Scan(&answer1)
switch(answer1){
case 1:
fmt.Println("It took me",
numQuestions,
"questions to guess your number")
// It will take maximum log(n) questions to guess a number
// Where n is size of the number range in this case 100
fmt.Println("Thanks for playing")
return
case 2:
numQuestions += 1
fmt.Println("Is it higher or lower than", guessedNumber)
fmt.Println("1) Higher")
fmt.Println("2) Lower")
fmt.Print("Enter your response (1 or 2):")
fmt.Scan(&answer2)
switch(answer2){
case 1:
// Halving the search space
// to exclude number lower than guessed
low = guessedNumber
case 2:
// Halving the search space
// to exclude number higher than guessed
high = guessedNumber
}
}
}
}
// Output
// Think of a number between 1 and 100
// I guessed: 50 Is that correct?
// 1) Yes
// 2) No
// Enter your response (1 or 2):2
// Is it higher or lower than 50
// 1) Higher
// 2) Lower
// Enter your response (1 or 2):2
// I guessed: 24 Is that correct?
// 1) Yes
// 2) No
// Enter your response (1 or 2):2
// Is it higher or lower than 24
// 1) Higher
// 2) Lower
// Enter your response (1 or 2):1
// I guessed: 37 Is that correct?
// 1) Yes
// 2) No
// Enter your response (1 or 2):1
// It took me 2 questions to guess your number
// Thanks for playing
```

Scaling an Algorithm with Linearithmic Time Complexity

Merge Sort, Quick Sort, and Heap Sort are some examples of algorithms with **linearithmic time complexity**.

Scaling an Algorithm with Factorial Time Complexity

The solution to the Travelling Salesman Problem has factorial time complexity.

In a hypothetical scenario, we have benchmark sorting performance of different algorithms by their time complexity.

Assuming the output of the time complexity function is the algorithm’s execution time in seconds. The runtime of a sorting algorithm on 100 elements with

- $O(\log_2{ n})$ time complexity is 6.644 seconds
- $O(n)$ time complexity is 100 seconds
- $O(n \log_2 n)$ time complexity is 664.4 seconds
- $O(n^2)$ time complexity is 10000 seconds (2.7 hours)
- $O(2^n)$ time complexity is $1.26 \times 10^{30}$ seconds ($3.99 \times 10^{22}$ years)
- $O(n!)$ time complexity is $9.33 \times 10^{157}$ seconds ($2.9585 \times 10^{150}$ years)

Time Complexity Comparison of Algorithms

With the same computation power, an algorithm with $O(2^n)$ time complexity will sort 100 elements in $3.99 \times 10^{22}$ years while an algorithm with $O(n \log_2{n})$ time complexity will take only 664.4 seconds .

Thank you for taking the time to read this blog post! If you found this content valuable and would like to stay updated with my latest posts consider subscribing to my RSS Feed.

Big Theta and Asymptotic Notation Explained

What is Moore’s Law

Relationship between exponentials & logarithms

The factorial function