Binary Search Trees
- Define a new abstract data type (ADT) that is an extension of the Set ADT that allows for duplicates
- Multiset
- Data: an unordered collection of values, allowing duplicates
- Operations: get size, insert a value, remove one occurrence of a specified value, check membership in the multiset
- Using
listto implement the Multiset ADT- Searching would have the of if simply append new items to the list
- Using binary search algorithm would improve the to
Binary Search Tree definitions
-
Binary tree is a tree in which every item has two (possibly empty) subtrees, which are labelled its left and right subtrees.
-
Binary search tree property - its value all items of left subtree, and all items of right subtree
-
Binary search tree (BST) - every item in the tree satisfies the binary search tree property
- As a “sorted tree”, efficient at insertion and deletion while painting the softness
- Empty case: Tree class (
_rootis None, subtrees is []), BST class (_root,_left,_rightare all None) - For non-empty BST, start by assigning at the
_root, then sort into_leftor_right- Leaf branches refer to empty BST (NOT refer to None)
- For BST, the only case for any of the attributes to be None is for it to be empty
class BinarySearchTree:
"""Binary Search Tree class.
Representation Invariants:
- (self._root is None) == (self._left is None)
- (self._root is None) == (self._right is None)
- (BST Property) if self._root is not None, then
all items in self._left are <= self._root, and
all items in self._right are >= self._root
"""
# Private Instance Attributes:
# - _root:
# The item stored at the root of this tree, or None if this tree is empty.
# - _left:
# The left subtree, or None if this tree is empty.
# - _right:
# The right subtree, or None if this tree is empty.
_root: Optional[Any]
_left: Optional[BinarySearchTree]
_right: Optional[BinarySearchTree]-
Based on the
Treeclass, the initializer and is_empty function:class BinarySearchTree: def __init__(self, root: Optional[Any]) -> None: """Initialize a new BST containing only the given root value. If <root> is None, initialize an empty BST. """ if root is None: self._root = None self._left = None self._right = None else: self._root = root self._left = BinarySearchTree(None) # self._left is an empty BST self._right = BinarySearchTree(None) # self._right is an empty BST def is_empty(self) -> bool: """Return whether this BST is empty. """ return self._root is None
Search for BST
- For BSTs the initial comparison to the root tells you which subtree you need to check.
- Check if its less than the root, search on left; greater than root, search right
class BinarySearchTree:
def __contains__(self, item: Any) -> bool:
"""Return whether <item> is in this BST.
"""
if self.is_empty():
return False
elif item == self._root:
return True
elif item < self._root:
return self._left.__contains__(item)
else:
return self._right.__contains__(item)Mutating BST
Insertion
- If BST is empty, make the new item the root of the tree
- Otherwise, use binary search tree property, insert into either left or right subtree as a leaf
Deletion
-
Use recursion to find the item (similar to the
containsfunction)class BinarySearchTree: def remove(self, item: Any) -> None: """Remove *one* occurrence of <item> from this BST. Do nothing if <item> is not in the BST. """ if self.is_empty(): pass elif self._root == item: self._remove_root() # refer to helper function elif item < self._root: self._left.remove(item) else: self._right.remove(item) -
Now as the root of a subtree (possibly a leaf), delete this root with a helper function
- Set
self._root = Noneif the tree consist just the root (no subtrees) - If at least one of the subtrees is empty, then “promote” the other subtree up
- If both subtrees are non-empty, replace the root with another value and remove the replaced value from else where
- This replacement value is either the max of the left or the min of the right subtree
- Extract the replacement value using:
BinarySearchTree._extract_max
class BinarySearchTree: def _remove_root(self) -> None: """...""" if self._left.is_empty() and self._right.is_empty(): self._root = None self._left = None self._right = None elif self._left.is_empty(): # "Promote" the right subtree. self._root, self._left, self._right = \ self._right._root, self._right._left, self._right._right elif self._right.is_empty(): # "Promote" the left subtree. self._root, self._left, self._right = \ self._left._root, self._left._left, self._left._right else: self._root = self._left._extract_max() # helper function - Set
-
Keep going to the right to bigger and bigger values until the end, then remove it. (Max has at most one child, therefore the process would be much easier)
class BinarySearchTree: def _extract_max(self) -> Any: """Remove and return the maximum item stored in this tree. Preconditions: - not self.is_empty() """ if self._right.is_empty(): max_item = self._root # Version 1: Like remove_root, "promote" the left subtree. self._root, self._left, self._right = \ self._left._root, self._left._left, self._left._right # Version: 2: # self._delete_root() return max_item else: return self._right._extract_max()
Running time analysis (__contains__)
- Start with analyzing the non-recursive running time of the method
- Assume that each recursive call takes constant time
- The
self.is_empty()anditem==self._rootanditem < self._rootand the return statements are constant time - Therefore, the non-recursive cost is 1 step per recursive call
- Analyzing the recursive calls
- BST’s contains method only recurse on one of its subtrees, so the number of recursive calls differences depending on the values stored in the tree and the searching value
- So
__contains__has a spread of running times, so Worst-case is focused here - The total number of recursive calls is at most the height of the BST plus 1 (longest possible search path is equal to the height of the BST, plus 1 is for recursing into an empty subtree)
- Since each non-recursive call takes 1 step,T this gives a total of steps, where is the height of the BST. Thus worst-case running time is
BST height vs size
- The height of a BST can be much smaller than its size
- A BST with items, its height can be as large as . Or as small as
- The height of tree can have size at most items. If store item in a BST, need to store all of them
- If it can be guaranteed that BST always have height roughly , then in fact all three BST’s operations (search, insert, delete) have a worst-case running time of . This is better than sorted list (search of , with insert and delete at
- Under this assumption, BST are more efficient than Multiset ADT