Introduction

  • Algorithms are procedures to accomplish a specific task. They are ideas.
  • Algorithms have three main attributes: correctness, simplicity, and efficiency
    • Correctness is tricky to prove: Many have proposed algorithms that were later proven to be incorrect.
    • Algorithms are always correct; heuristic solutions may give a suitable answer most of the time, but not always.

Correctness

  • Specify the input
  • Specify the output
    • Make sure output is well-defined
    • Beware of a compound goals. Each part might be a good goal, but trying satisfy all at the same time can make things overly complicated.
  • If in doubt, restrict the problem. It will make proving correctness easier.
  • Counter-examples: the easiest way to prove incorrectness, but failing to find counter-examples does not prove correctness.
    • A counter-example should be verifiable. Show what the proposed algorithm gives as output. Then show that there’s is a better answer.
    • They should be simple, no extra bits.
    • Finding good counter-examples is similar to writing good tests. But finding good counter-examples is not about exploring the whole domain.
      • Think small. Confine the input space and exhaust all options. Look for weaknesses (e.g. If algorithm relies on finding the maximum, what happens if all values are equal?). Look for ties. Look at extreme cases: large, small, clustered, diffuse.
  • Proof by induction is the most common way to prove an algorithm, especially recursively defined ones.
    • Specify a boundary or base case
    • Assuming several things are true, show that incrementally breaking the problem into related sub-problems, the output is true.
    • Two common errors are bad base cases or assuming a proven algorithm can be applied to input that has not been proven yet.
  • Summations are used often in algorithm analysis
    • Arithmetic progressions
      • e.g. The sum of the first n positive integers is $n(n+1)/2$
      • In general, if the term is quadratic, the sum will have a cubic order; a cubic term will have a sum that is a quartic order, etc. If the terms have a negative exponent, the sum converges to a constant.
    • Geometric progressions
      • If the base of each term is less than one, the summation always converges to a number. If the base is one or more, it diverges.
    • When proving a summation by induction, the main step is to usually factor out the largest term, so that the inductive hypothesis can be substituted into the equation.

Modeling problems

Modeling your application in terms of well-defined data structures and algorithms is the most important step to finding a solution.

It’s essential to know some common vocabulary in order to properly describe your problem. Do this by examining the catalog of problems in the second half of the book and look at the inputs and outputs of each problem.

  • Combinatorial objects: Permutations (orderings, sequence, tour), subsets (different-sized sets from a fixed list of values), trees, graphs, points, polygons, strings.
  • Recursive objects: pretty much any combinatorial object can be split into smaller similar objects.
    • Thinking recursively means thinking about similar objects that are related to each other.
    • There is always a base case (simplest example of the object) and rules for splitting the larger object into smaller, but similar objects.

Data Structures

Arrays

Provide O(1) time random access, contiguous memory layout. But can lead to O(n) time when adding or deleting items. Dynamic arrays have amortized O(n) additions and deletions.

Linked lists (single and double linked)

Fundamental problems: Adding and deleting nodes, finding the middle, reversing list.

Stacks and queues

Key operations: Add or delete end nodes O(1), Access end nodes, Iterating in LIFO or FIFO order.

Dictionaries

Can be implemented best with hash tables or balanced binary search trees.

Binary search trees

Tree facts:

  • A complete tree with height, h, has 2^h - 1 nodes

Hash tables

The key idea is transforming some data into a single, unique integer. This can be used to test file integrity or efficiently search for substrings (Rabin-Karp).

Worst case performance can be O(n) if all values hash to the same value. Hashing functions almost always involve a prime number.

Two main ways to resolve hash collision are chaining and open addressing. Chaining simply builds a list of values at the hash. When inserting a value, append to the list. When searching for a value, hash the value. If multiple values collide, search the list located at the hash.

Open addressing simply look for an open spot in the array when a collision occurs. More book keeping needs to be done when deleting and often you’ll need to re-hash and re-insert all items.

Sorting and Searching

Sorting can be used to illustrate most algorithm design paradigms. Data structure techniques, divide-and-conquer, randomization, and incremental construction all lead to efficient sorting algorithms.

Think binary search with O(log(n)) time and quicksort or mergesort with O(log(n)) time. Also, bucket or radix sort for certain cases which has O(n) time.

Many problems can be solved sufficiently (but not always optimally) by sorting values first: Searching, closest pair of numbers, unique values, frequency distribution, selection (or find the median value), convex hull.

Sorting is the most researched topic in computer science and many sorting algorithms exist, most having advantages in certain cases.

  • Insertion sort

    • Worst O(n^2) time, but can approach linear if array is almost sorted
    • Stable sort
  • Selection sort

    • Worst O(n^2)
    • Minimizes swaps
  • Mergesort

    • O(n*log(n)) worst case
    • Stable sort
    • Usually needs O(n) extra space
    • Can be used to sort a linked list
    • There are log(n) “levels” of sub lists because we halve the input until we have list of one element. We then merge each level in O(n) time.
  • Quicksort

    • Average O(n*log(n)) time will usually beat mergesort
    • Worst is O(n^2) time if partitioning isn’t close to the median value or there are a lot of duplicate values
  • Radix, or bucket sort

    • O(k*n) time, where k is the key length and n is the number of keys
      • Notice that if the k=n, this turns into a quadratic algorithm. Therefore, radix sort is only efficient if k<<n.
    • MSD or LSD, but usually LSD is more efficient
  • Binary search

    • Divide and conquer and O(log(n)) time
      • Requires sorted input
    • Problems
      • Counting frequency of elements
      • One-sided binary search
      • Bisection method for finding roots of equations
        • Faster root-finding algorithms exist, but require additional information about the functions like the derivative