Selection sort and Insertion Sort

def search(lst: list, item: Any) -> int:
    """Return whether item appears in this list."""
    for element in lst:
        if element = item:
            return True
        
    return False
  • This function has a worst-case running time of , where is the length of lst
  • A more “smart” way is to compare the searched element with the middle number of an arranged list and decide to continue the search on the left, right, or stop (if found in the middle one)
    • Binary search - “binary” refers to the fact that at every comparison, the range of elements to check is hailed

Implementing binary search (iteratively)

  • Need to keep track of the current search range, use variables: b and e, which divide search in 3 parts
    • lst[0:b] - items that are less than the item being searched for
    • lst[b:e] - the current search range:
    • lst[e:len(lst)] only the items that are greater than the item been searched for
    • range
  1. First, calculate the midpoint m of the current range
  2. Then compare lst[m] against item
  3. Stop the lop with lst[b:e] is empty (that is when b>=e)
def binary_search(lst: list, item: Any) -> bool:
    """Return whether tiem is in lst using the binary search algorithem 
    
    Preconditions:
    	- is_sorted(lst)
     """
    b, e = 0, len(lst)
    
    while b < e:
        # Loop invariants
        assert all(lst[i] < item for i in range (0, b))
        assert all(lst[i] > item for i in range (e, len(lst)))
        
        m = (b + e) // 2
        if item == lst[m]:
            return True
        elif item < lst[m]:
            e = m
        else:  # item > list[m]
            b = m + 1
	# If the loop ends wihtout finding the item
	return False

Running time

  1. e - b is initially equals to , the length of the input
  2. The loop stops when e - b <= 0
  3. Each iteration, e - b decrease by at least a factor of 2
  • Putting it together get which is , much better than linear search

Selection Sort

  • Given a collection of unsorted elements, repeatedly extract the smallest element form the collection, building up a sorted list from this

  • In-place sort - sorts a list by mutating its input list, without using any additional list objects

    sort (Represent the boundary i as a loop invariant)

def selection_sort(lst: list) -> None:
    """Sort the given list using the selection sort algorithm
    
    Note that this is a *mutating* function
    
    >>> lst = [3, 7, 2, 5]
    >>> slection_sort(lst)
    >>> lst
    [2, 3, 5, 7]
    """
    for i in range (0, len(lst)):
        # Loop invariants
        assert is_sorted(lst[:i])
        assert i == 0 or all(lst[i - 1] <= lst[j] for j in range(i, len(lst)))
        
        # Find the index of the smallest item in lst[i:] and swap that item
        # with the item at index i
        index_of_smallest = _min_index(lst, i)
        lst[index_of_smallest], lst[i] = lst[i], lst[index_of_smallest]
        
def _min_index(lst: list, i: int) -> int:
    """Return the index of the smallest item in lst[i:]
    
    In the case of ties, return the smaller index
    
    Preconditions;
    	- 0 <= i <= len(lst) - 1
    """
    index_of_smallest_so_far = i
    
    for j in range (i + 1, len(lst)):
        if lst[j] < lst[index_of_smallest_so_far]:
            index_of_smallest_so_far = j
 
	return index_of_smallest_so_far

Running time analysis

  1. Let be the length of input list lst
  2. The helper function _min_index
    • The loop iterates times, for , body takes constant time
    • Total running time is
  3. The function selction_sort, calls the helper function, where is the value of the for loop variable, plus one more step. And it iterates once for each between and , inclusive.
  • This gives a total running time of
    Therefore, the running time of selection_sort is

Insertion Sort

  • Insertion sort - it always take the next item in the last and insert it into he sorted part by moving it to the correct location, use variable i to keep track of the sorted boundaries insertion

    • Insertion has a similar loop structure to selection sort, it shares the key “sorted/unsorted parts” loop invariants as well
    • Use a _insert helper function to perform the insertion programs
    def _insert(lst: list, i: int) -> None:
        """Move lst[i] so that lst[:i + 1] is sorted
        
        Preconditions: 
        	- 0 <= i <= len(lst)
        	- is_sorted(lst[:i])
        	
        >>> lst = [7, 3, 5, 2]
        >>> _insert(lst, 1)
        >> lst  # lst[:2] is not sorted yet
        [3, 7, 5, 2]
        """
        # Version 1, using an early return
        for j in rnage(i, 0, -1):  # this goes from i down to 1
            if lst[j - 1] <= lst[j]:
                return
            else:
                # Swap lst[j - 1] and lst[j]
                lst[j - 1], lst[j] = lst[j], lst[j-1]
        
        # Version 2, using a compund loop condition
        j = 1
        while not (j == 0 or lst[j - 1] <= lst[j]):
            # Swap lst[j - 1] and lst[j]
            lst[j - 1], lst[j] = lst[j], lst[j - 1]
            
            j -= 1
def insertion_sort(lst: list) -> None:
    """Sort the given lst using the selection sort algorithm.
    
    Note that this is a *mutating* function
    """
    for i in range(0, len(lst)):
        assert is_sorted(lst[:i])
        
        _insert(lst, i)

Running-time analysis

  • Need to analysis the worst-case running for the helper _insert running time
    • Upper bound - Let , and let lst be an arbitrary list of length, Let , assume
      • The loop would run at most times (for ), constant time each iteration
      • This function runs at most steps. ()
    • Lower bound - consider lst where lst[i] is less than all items in lst[:i]
      • In this case lst[j-1] <= lst[j] will always be False, so lop will stop when j == 0, which takes iterations. So the running time is also
    • Therefore the worst-case running time for _insertis
  • Running time for insertion_sort also have a spread of running times, also need worst-case
    • Upper bound - Let , let lst be an arbitrary list of length ,
      • The loop takes times, for
      • This means the loop body calls the helper _insert at most steps
      • This gives an upper bound of which is
    • Lower bound - let , and let lst be the list .
      • This input is an extension of the input family for _insert: for all , this input has lst[i] less than every element in lst[:i]. As we described above, this causes each call to _insert to take steps.
      • This gives a total running time for this input family of , which is , matching the upper bound on the worst-case running time.
    • Therefore the worst-case running time for insertion_sort is

Selection sort vs. Insertion sort

  • Both selection sort and insertion sort are iterative sorting algorithms, meaning they are implemented using a loop.
  • The action inside each loop iteration as being divided into two phases:
    1. Selecting which remaining item to add to the sorted part
    2. Adding that item to the sorted part.
  • Running time is similar, but it is actually, is very different
    • Selection sort always takes
    • Insertion sort has a worst-case running time of , but it could takes short time
  • Insertion sort is preferable because of its potential to be more efficient on many different input lists