Design and implement a self-balancing binary search tree that supports insertion, search, deletion, and range queries.

Medium
9 months ago

Let's explore binary search trees. Consider that you are given a stream of numerical data. Your task is to design and implement a self-balancing binary search tree (BST) data structure that efficiently supports the following operations:

  1. Insertion: Insert a new number into the BST while maintaining its self-balancing property. The balancing mechanism should ensure that the height of the tree remains logarithmic in the number of nodes. Implement the insertion logic and explain the balancing technique used (e.g., AVL trees, Red-Black trees).

  2. Search: Given a number, determine if it exists in the BST. Implement an efficient search algorithm that leverages the BST property to locate the number or confirm its absence.

  3. Deletion: Remove a number from the BST while maintaining its self-balancing property. Handle different deletion scenarios, such as deleting a leaf node, a node with one child, and a node with two children. Describe how the balancing is preserved after deletion.

  4. Range Query: Given a range (min, max), return all numbers within the BST that fall within this range, in sorted order. Optimize the range query to avoid traversing unnecessary parts of the tree.

For example, if the stream of numbers is [10, 5, 15, 3, 7, 12, 18], the BST should dynamically adapt to maintain balance as each number is inserted. Demonstrate how your implementation handles operations like inserting 6, deleting 10, and performing a range query for numbers between 6 and 16. Discuss the time complexity of each operation in terms of the number of nodes in the tree (n) and the height of the tree (h). Pay close attention to edge cases and error handling in your implementation.

Sample Answer

Self-Balancing Binary Search Tree Implementation

This response provides a detailed implementation of a self-balancing binary search tree (BST) with insertion, search, deletion, and range query operations. It utilizes the AVL tree balancing technique to ensure logarithmic time complexity for these operations.

1. Insertion

Naive Solution

The naive solution involves simply inserting the node into the BST without any balancing. This can lead to unbalanced trees in the worst case, resulting in O(n) time complexity for search, insertion, and deletion.

Optimal Solution (AVL Tree)

AVL trees are self-balancing BSTs where the height difference between the left and right subtrees of any node is at most 1. After each insertion, we check if the AVL property is violated and perform rotations to rebalance the tree.

class AVLNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.height = 1

class AVLTree:
    def __init__(self):
        self.root = None

    def _height(self, node):
        if not node:
            return 0
        return node.height

    def _update_height(self, node):
        node.height = 1 + max(self._height(node.left), self._height(node.right))

    def _balance_factor(self, node):
        if not node:
            return 0
        return self._height(node.left) - self._height(node.right)

    def _rotate_right(self, z):
        y = z.left
        T3 = y.right

        y.right = z
        z.left = T3

        self._update_height(z)
        self._update_height(y)

        return y

    def _rotate_left(self, z):
        y = z.right
        T2 = y.left

        y.left = z
        z.right = T2

        self._update_height(z)
        self._update_height(y)

        return y

    def insert(self, key):
        self.root = self._insert(self.root, key)

    def _insert(self, node, key):
        if not node:
            return AVLNode(key)

        if key < node.key:
            node.left = self._insert(node.left, key)
        else:
            node.right = self._insert(node.right, key)

        self._update_height(node)

        balance = self._balance_factor(node)

        # Left Left Case
        if balance > 1 and key < node.left.key:
            return self._rotate_right(node)

        # Right Right Case
        if balance < -1 and key > node.right.key:
            return self._rotate_left(node)

        # Left Right Case
        if balance > 1 and key > node.left.key:
            node.left = self._rotate_left(node.left)
            return self._rotate_right(node)

        # Right Left Case
        if balance < -1 and key < node.right.key:
            node.right = self._rotate_right(node.right)
            return self._rotate_left(node)

        return node

Explanation

  1. AVLNode Class: Represents a node in the AVL tree, storing the key, left child, right child, and height.
  2. AVLTree Class: Represents the AVL tree, with a root node.
  3. _height(node): Returns the height of a node. If the node is None, it returns 0.
  4. _update_height(node): Updates the height of a node based on the heights of its children.
  5. _balance_factor(node): Calculates the balance factor of a node (height of left subtree - height of right subtree).
  6. _rotate_right(z): Performs a right rotation on node z.
  7. _rotate_left(z): Performs a left rotation on node z.
  8. insert(key): Inserts a new key into the AVL tree.
  9. _insert(node, key): Recursive helper function for insertion. It inserts the key and then checks for balance. Rotations are performed if necessary to maintain the AVL property.

2. Search

Naive Solution

A linear search through an unsorted data structure would take O(n) time.

Optimal Solution

Leveraging the BST property, we can perform an efficient search. The BST property dictates that for any node, all keys in its left subtree are less than the node's key, and all keys in its right subtree are greater than the node's key.

    def search(self, key):
        return self._search(self.root, key)

    def _search(self, node, key):
        if not node:
            return False

        if key == node.key:
            return True
        elif key < node.key:
            return self._search(node.left, key)
        else:
            return self._search(node.right, key)

Explanation

  1. search(key): Searches for a key in the AVL tree.
  2. _search(node, key): Recursive helper function for searching. It checks if the key matches the current node's key. If not, it recursively searches the left or right subtree based on the key's value.

3. Deletion

Naive Solution

Finding and deleting a node without rebalancing can lead to an unbalanced tree, similar to insertion.

Optimal Solution (AVL Tree)

Deletion in an AVL tree also requires maintaining the AVL property. After deleting a node, we need to check the balance factors of the affected nodes and perform rotations if necessary.

    def delete(self, key):
        self.root = self._delete(self.root, key)

    def _delete(self, node, key):
        if not node:
            return node

        if key < node.key:
            node.left = self._delete(node.left, key)
        elif key > node.key:
            node.right = self._delete(node.right, key)
        else:
            # Node with only one child or no child
            if not node.left:
                return node.right
            elif not node.right:
                return node.left

            # Node with two children: Get the inorder successor
            # (smallest in the right subtree)
            node.key = self._min_value_node(node.right).key

            # Delete the inorder successor
            node.right = self._delete(node.right, node.key)

        if not node:
            return node # Node was a leaf and is now deleted

        self._update_height(node)

        balance = self._balance_factor(node)

        # Left Left Case
        if balance > 1 and self._balance_factor(node.left) >= 0:
            return self._rotate_right(node)

        # Right Right Case
        if balance < -1 and self._balance_factor(node.right) <= 0:
            return self._rotate_left(node)

        # Left Right Case
        if balance > 1 and self._balance_factor(node.left) < 0:
            node.left = self._rotate_left(node.left)
            return self._rotate_right(node)

        # Right Left Case
        if balance < -1 and self._balance_factor(node.right) > 0:
            node.right = self._rotate_right(node.right)
            return self._rotate_left(node)

        return node

    def _min_value_node(self, node):
        current = node
        while current.left:
            current = current.left
        return current

Explanation

  1. delete(key): Deletes a key from the AVL tree.
  2. _delete(node, key): Recursive helper function for deletion. It handles three cases:
    • Node with no children or one child: Simply remove the node.
    • Node with two children: Replace the node with its inorder successor (minimum value in the right subtree) and then delete the inorder successor.
  3. _min_value_node(node): Finds the node with the minimum value in a subtree. This is used to find the inorder successor.
  4. After deletion, the balance factor is checked and rotations are performed as needed.

4. Range Query

Naive Solution

A naive solution involves traversing the entire tree and checking if each node's key falls within the given range. This would take O(n) time.

Optimal Solution

To optimize the range query, we can leverage the BST property to avoid traversing unnecessary parts of the tree. We only traverse the subtrees that potentially contain nodes within the range.

    def range_query(self, min_val, max_val):
        result = []
        self._range_query(self.root, min_val, max_val, result)
        return result

    def _range_query(self, node, min_val, max_val, result):
        if not node:
            return

        if min_val < node.key:
            self._range_query(node.left, min_val, max_val, result)

        if min_val <= node.key <= max_val:
            result.append(node.key)

        if max_val > node.key:
            self._range_query(node.right, min_val, max_val, result)

Explanation

  1. range_query(min_val, max_val): Performs a range query to find all keys within the given range.
  2. _range_query(node, min_val, max_val, result): Recursive helper function for the range query. It traverses the tree, adding keys within the range to the result list. It prunes the search by only traversing the left subtree if min_val < node.key and the right subtree if max_val > node.key.

Example Usage

tree = AVLTree()
numbers = [10, 5, 15, 3, 7, 12, 18]
for num in numbers:
    tree.insert(num)

tree.insert(6)

print("Search for 7:", tree.search(7))
print("Search for 20:", tree.search(20))

tree.delete(10)

print("Range query (6, 16):", tree.range_query(6, 16))

Time Complexity Analysis

  • Insertion: O(log n) - Due to the self-balancing nature of AVL trees.
  • Search: O(log n) - The BST property allows efficient searching.
  • Deletion: O(log n) - Maintaining balance after deletion requires rotations.
  • Range Query: O(k + log n) - where k is the number of nodes in the range. The log n factor comes from finding the start of the range, and the k factor comes from traversing the nodes within the range.

Space Complexity Analysis

  • AVL Tree: O(n) - where n is the number of nodes in the tree. This is because, in the worst-case scenario, the tree will need to store n nodes. The height of the tree is guaranteed to be logarithmic (log n) due to self-balancing. Recursive calls in functions such as insert, delete, and search would take O(log n) space in the call stack.

Edge Cases and Error Handling

  • Empty Tree: Handle cases where the tree is empty when performing search, deletion, or range queries.
  • Duplicate Keys: Decide how to handle duplicate keys during insertion (e.g., allow them, disallow them, or store a count).
  • Invalid Range: For range queries, handle cases where min_val > max_val (e.g., return an empty list or raise an exception).
  • Key Not Found: In deletion, gracefully handle the case where the key to be deleted is not found in the tree (e.g., do nothing or raise an exception).

Summary

This response provides a comprehensive implementation of a self-balancing binary search tree using the AVL tree technique. It covers insertion, search, deletion, and range query operations, along with detailed explanations of the algorithms and their time/space complexities. The implementation also addresses edge cases and error handling to ensure robustness.