Log programming problems and what I learned.

Legend: ADM -> Algorithm Design Manual

LeetCode: 287. Find the Duplicate Number

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.

There is only one repeated number in nums, return this repeated number.

You must solve the problem without modifying the array nums and uses only constant extra space.

You can use the pigeonhole principle to prove that since we have n+1 integers, but only n distinct integers, at least one integer will occur more than one time.

# Approach 1: Sort
# O(n*lg(n)) time and O(log n) or O(n) space depending on the sorting algorithm
def findDuplicate(nums: List[int]) -> int:
        nums.sort()
        for i in range(1, len(nums)):
            if nums[i] == nums[i-1]:
                return nums[i]

# Approach 2: Set
# O(n) time and O(n) space, worst case
def findDuplicate(nums: List[int]) -> int:
        seen = set()
        for num in nums:
            if num in seen:
                return num
            seen.add(num)

# Approach 3: Flip sign at index
# For each integer, use that integer as the index.  If nums[abs(cur)] is
# negative, cur is the duplicate.  Otherwise, if the value is positive, change
# the sign to negative.
# O(n) time and O(1) space, kind of.  You do use O(n) sign bits.
def findDuplicate(nums: List[int]) -> int:
        for num in nums:
            cur = abs(num)
            if nums[cur] < 0:
                duplicate = cur
                break
            nums[cur] = -nums[cur]

        # Restore numbers
        for i in range(len(nums)):
            nums[i] = abs(nums[i])

        return duplicate

# Approach 4: Recursive hashmap
# At each step, if the value is equal to the index, we found the duplicate.
# Otherwise, put the value in the slot with the same index value and call with
# the old value that was replaced.
# O(n) time and O(n) space in the stack.
def findDuplicate(nums: List[int]) -> int:

    def store(nums: List[int], cur: int) -> int:
        if cur == nums[cur]:
            return cur
        nxt = nums[cur]
        nums[cur] = cur
        return store(nums, nxt)

    return store(nums, 0)

# Approach 5: Iterative hashmap
# O(n) time and O(1) space.  No need for additional stack space.
def findDuplicate(nums: List[int]) -> int:
    while nums[nums[0]] != nums[0]:
        nums[nums[0]], nums[0] = nums[0], nums[nums[0]]
    return nums[0]

# Approach 6: Binary search
# First approach that doesn't use extra space *and* doesn't modify the array.
# Find the smallest index such that the number of values less than or equal to
# the index is greater than the index itself.
# O(n*lg(n)) time and O(1) space. The sum requires O(n) time, but we only need
# to do it O(lg n) times.
def findDuplicate(nums: List[int]) -> int:
        # 'lo' and 'hi' represent the range of values of the target
        lo, hi = 1, len(nums) - 1
        while lo <= hi:
            cur = lo + (hi - lo) // 2
            # Count how many numbers are less than or equal to 'cur'
            count = sum(num <= cur for num in nums)
            if count > cur:
                duplicate = cur
                hi = cur - 1
            else:
                lo = cur + 1

        return duplicate

# Approach 7: Sum of Set Bits
# O(n*lg(n)) time and O(1) space.  Because the outer loop depends on the number
# of bits, it runs in O(lg(n)).  The inner loop goes through each number.
def findDuplicate(nums: List[int]) -> int:
    duplicate, n = 0, len(nums)
    bits = n.bit_length()
    for bit in range(bits):
        mask = 1 << bit
        base_count, nums_count = 0, 0
        for i in range(n):
            # If bit is set in number (i) then add 1 to base_count
            if i & mask: base_count += 1

            # If bit is set in nums[i] then add 1 to nums_count
            if nums[i] & mask: nums_count += 1

        # If the current bit is more frequently set in nums than it is in
        # the range [1, 2, ..., n] then it must also be set in the duplicate
        # number.
        if nums_count - base_count > 0:
            duplicate |= mask

    return duplicate
# Approach 8: Floyd's Tortoise and Hare (Cycle Detection)
# Boils down to finding the starting point of a cycle in a linked list.  Use
# two pointers to detect a loop and then find the starting point.
# O(n) time and O(1) space (for pointers).
def findDuplicate(nums: List[int]) -> int:
    # Find the intersection point of the two runners.
    tortoise = hare = nums[0]
    while True:
        tortoise = nums[tortoise]
        hare = nums[nums[hare]]
        if tortoise == hare:
            break

    # Find the "entrance" to the cycle.
    tortoise = nums[0]
    while tortoise != hare:
        tortoise = nums[tortoise]
        hare = nums[hare]

    return hare

LeetCode: 238. Product of Array Except Self

Given an integer array nums, return an array answer such that answer[i] is equal to the product of all the elements of nums except nums[i].

The product of any prefix or suffix of nums is guaranteed to fit in a 32-bit integer.

You must write an algorithm that runs in O(n) time and without using the division operation.

If we ignore, for a moment, the rule about forbidding division, we could compute the product of all numbers and then get the solution by dividing the total product by the current number to get the product minus the current number. We just have to do a little extra work when there are one or more zeroes in the array (most products will be zero).

However, a different solution first finds the product of each sub-array to the left of a the current value. Second, makes a pass from right to left, computing the right sub-array products and multiplying the left and right products to get the total product “minus” the current value. See code below:

def productExceptSelf(self, nums: List[int]) -> List[int]:
    n = len(nums)
    res = [1 for _ in range(n)]

    for i in range(1, n):
        res[i] = res[i-1] * nums[i-1]

    left = 1
    for i in reversed(range(n)):
        res[i] = res[i] * left
        left = left * nums[i]

    return res

LeetCode: Trie (Prefix)

A simple trie data structure can be implemented using dictionaries. Notice how each method has a similar structure. Tries are used to efficiently search for parts of strings. A dictionary or set can look for exact matches, but can’t answer questions about whether we’ve seen a word that starts with “sh”.

class Trie:

    def __init__(self):
        self.root = {}

    def insert(self, word: str) -> None:
        curr = self.root
        for c in word:
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
        curr['#'] = None

    def search(self, word: str) -> bool:
        curr = self.root
        for c in word:
            if c not in curr:
                return False
            curr = curr[c]
        return '#' in curr


    def startsWith(self, prefix: str) -> bool:
        curr = self.root
        for c in prefix:
            if c not in curr:
                return False
            curr = curr[c]
        return True

Problem:

A stream of data where the first k elements are 0’s and the rest are 1’s. Find the transition point.

Solution:

Repeatedly double the index number and check if we have hit the ones. Once we do find the ones, we have two bounds: the left bound is in the zeroes and the right bound is in the ones. Do a binary search between the bounds to find the transition point. This is does at most 2*ceiling(log(k)) comparisons.

ADM: Count Occurrences

Problem:

Given a sorted array and a value, count the number times the value occurs in the array.

Solution:

One way would be to iterate through the whole list in O(n). A slightly better solution would be to do a binary search to find the value. If it exists, iterate to the left and right to find the boundaries of where chunk values ends. In the worst case where all values are equal, this is still O(n).

You can do this in O(log(n)) time by modifying the binary search to find the left boundary and doing a separate binary search to find the right boundary. The bisect module implements these searches in O(log(n)) time.

def count(arr: list[int], x: int) -> int:
    left = bisect.left(arr, x)
    right = bisect.right(arr, x)
    return right - left

Geeks for Geeks: Graph traversal

  • One source and One Destination
    • Use A* Search Algorithm (For Unweighted as well as Weighted Graphs)
  • One Source, All Destination
    • Use BFS (For Unweighted Graphs)
    • Use Dijkstra (For Weighted Graphs without negative weights)
    • Use Bellman Ford (For Weighted Graphs with negative weights)
  • Between every pair of nodes
    • Floyd-Warshall
    • Johnson’s Algorithm

Python library: sortedcontainers

This is a data structure that replaces balanced binary search trees and order statistic trees (meaning you can find the ith largest and ith smallest value; or the rank (index) of a value in O(log(n) time)). Initializing typically takes O(n*log(n)) time. Most other operations take O(log(n)) or O(n) time. It allows for fast O(log(n)) random access along with fast O(log(n)) inserts and deletes. A regular Python list would need O(n) time to keep the list sorted.

from sortedcontainers import SortedList, SortedKeyList, SortedDict, SortedSet

Geeks for Geeks: Sliding window

https://www.geeksforgeeks.org/window-sliding-technique/

The sliding window technique is used to update a value by removing something and adding something to avoid a major recalculation of the value.

For instance, what if you want to find the largest sum of k consecutive values in an array? One way would be to add k consecutive elements starting with the first element, then moving to the second and re-calculating the sum with the k next elements. This approach does O(k*(n-k)) additions.

The sliding window calculates an initial sum, arr[:k]. Instead of doing an additional k addition operations, subtract the value, arr[0], and add arr[k] to the initial sum. This cuts the addition operations from O(kn) to O(n).

CTCI: 8.3 Magic index

Problem:

Given a sorted array of distinct integers, find an index where the index equals the element. In other words, where arr[ix] == ix.

Notes:

  • Sorted input, think binary search
  • Think about recursive and iterative solutions: In this problem, the variant with duplicate values was easier to think about because we need to search both the left and the right. The iterative binary search only looks at left or the right half, not both. We would need two while loops, I think.
  • Work on a big enough example. At first, I looked at arrays with three values. This was too small to see that we needed to look in the right half of the array if arr[mid] < mid.

Solution:

The simplest approach is to just iterate over the whole array, checking if arr[ix] == ix. This runs in O(n) and doesn’t take advantage of the sorted array.

The next approach should be some sort of binary search since this will take O(log(n)) time. We still test if arr[ix] == ix, but what should we do if arr[ix] < ix? In this case, we should look in the right half of the array. It can’t be in the left half because arr[ix] is already too small and ix can only decrease by one or more.

def magic_index(arr: list[int]) -> int:
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if arr[mid] == mid:
            return mid
        elif arr[mid] < mid:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

What if we allow duplicate values? We now have to search both halves, but we can limit our search based on values of arr[ix] and ix.

def magic_index(arr: list[int]) -> int:
    return find_index(arr, 0, len(arr)-1)

def find_index(arr, lo, hi):
    if hi < lo:
        return -1

    mid = lo + (hi - lo) // 2
    if arr[mid] == mid:
        return mid

    left = find_index(arr, lo, min(mid-1, arr[mid]))
    if left >= 0:
        return left

    return find_index(arr, max(mid+1, arr[mid]), hi)

CTCI: 8.2 Robot in a grid

Problem:

Given a rectangular 2D grid with r rows and c columns, find a path from the upper-left corner to the lower-right corner. A selected number of cells in the grid may not be usable. The only possible moves from each cell are to the right or down.

Solution:

The most natural approach starts at the lower-right corner and checks each cell to see if there is a path from the current cell to the starting cell in the upper-left. The running time is O(2^r+c) because each possible path is r+c in length and there are two choices of direction at each cell. However, this approach visits multiple cells many times.

The code below does something similar, but caches points that did not lead to a valid path. This allows us to check sub-paths only once. This runs in O(rc) since we only visit each cell once.

def find_path(grid):
    path = []
    if not grid:
        return ''.join(path)

    failed_points = set()
    if get_path(grid, len(grid)-1, len(grid[0])-1, path, failed_points):
        return ''.join(path)

    return ''.join(path)

def get_path(grid, row, col, path, failed_points):
    cell_not_available = grid[row][col]
    if row < 0 or col < 0 or cell_not_available:
        return False

    p = (row, col)
    if p in failed_points:
        return False

    at_start = row == col == 0
    if at_start:
        return True
    elif get_path(grid, row, col-1, path, failed_points):
        path.append('r')
        return True
    elif get_path(grid, row-1, col, path, failed_points):
        path.append('d')
        return True

    failed_points.add(p)
    return False

StackExchange: Select random node from binary tree uniformally

Problem:

Select a random node from a binary tree uniformally.

https://cs.stackexchange.com/questions/83540/how-to-select-a-binary-tree-node-uniformly-at-random

Solution:

Given the code below, the algorithm selects a node uniformally because the probability of finding the node in the left subtree is self.left/self.size, the current node is 1/self.size, and the right subtree is the rest.

class Node:
    def __init__(self, data, left, right, size):
        self.data = data
        self.left = left
        self.right = right
        # self.size is the number of nodes in the subtree rooted at
        # self, including the self node.
        self.size = size

    def get_random_node(self):
        size = 0 if not self.left else self.left
        ix = random.randint(0, self.size)
        if ix < size:
            return left.get_random_node()
        elif ix == size:
            return self
        else:
            return right.get_random_node()

CTCI: 8.1 Triple Step

Problem:

Count the number of possible ways for a child to hop up n steps by hopping 1, 2, or 3 steps at a time.

Solution:

The easiest way to think about it is a top-down approach. How many ways are there if we are at the n-1th step? Well, how ever many ways it took to get to the n-1th step. Same goes for the n-2th step and the n-3the step. This naturally leads to a dynamic programming solution.

If top-down, use @functools.cache to save the results of each recursive call. This turns an O(3^n) function into a O(n).

A couple points. What should the base case be? If n equals zero, we are at the top. Is there one way or no ways to get here? Either definition can be used. It’s easier if n=0 is defined to have one way. If n=0 has zero ways to reach the top, you need to define three other base cases when n=={1,2,3}.

Also, keep in mind that the number of ways grows exponentially with n. In Python, integers are automatically recast to a larger integer type, but, in other languages, the result will quickly outgrow a 32-bit integer.

@functools.cache
def count_ways(n: int) -> int:
    if n < 0:
        return 0
    elif n == 0:
        return 1

    return count_ways(n-1) + count_ways(n-2) + count_ways(n-3)

LeetCode: 380. Insert Delete GetRandom O(1)

https://leetcode.com/problems/insert-delete-getrandom-o1/

Answer:

https://stackoverflow.com/questions/15993447/python-data-structure-for-efficient-add-remove-and-random-choice

Create a data structure that inserts or removes a certain value, but also supports retrieving a random value with uniform distribution. All methods should run in O(1) time.

Most of this is straightforward once you realize you need an array and a dictionary. The only other tricky part is removing values. To do it efficiently, you remove the end value in the array, which is likely not the value you want to remove. You then overwrite the information for the actual value you want to remove using the end value.

Also, random.choice picks a value from a sequence. To get k random values without replacement, sue random.sample.

import random

class RandomizedSet(object):
    def __init__(self):
        self.values = []
        self.d = {}

    def insert(self, val: int) -> bool:
        not_contains = val not in self.d
        if not_contains:
            self.values.append(val)
            self.d[val] = len(self.values) - 1

        return not_contains

    def remove(self, val: int) -> bool:
        contains = val in self.d
        if not contains:
            return contains

        last = self.values.pop()
        val_ix = self.d.pop(val)

        if val_ix != len(self.values):
            self.values[val_ix] = last
            self.d[last] = val_ix

        return contains

    def getRandom(self) -> int:
        return random.choice(self.values)

HackerRank: Max Array Sum

Input: An array of integers

Output: An integer, the largest sum of non-adjacent values


def max_sum(arr):
  incl, excl = 0, 0

  for val in arr:
    new_excl = max(incl, excl)
    incl = excl + val
    excl = new_excl

  return max(incl, excl)

The algorithm is O(n) because of the for loop. Finding the max of a bunch of values could be said to run in linear time, if we were checking a lot of values, but because we are only checking between two values, it’s O(1).

LeetCode: Range Sum Query - Immutable

My answer runs in O(n^2) time, since it needs to compute the sum every time and the range can be, at most, n. For repeated queries, a better solution is to pre-compute sums that start at the beginning and run to each position. Then, a range sum can be computed by pre[right+1] - pre[left]. This solution is O(n) for the caching and then O(1) for each query after that.

This is similar to dividing a factorial by another factorial to get rid of lower factors, 90!/85! = 90*89*88*87*86.

LeetCode: Longest Substring Without Repeating Characters

My answer used a set to keep track of what characters have already been seen and using str.find() to update the beginning index of the substring when we encountered a duplicate character.

Another answer used a dictionary where the key was the character and the value was the previous position seen. It then updated this value as needed.