How do you find the union of multiple sets?

Medium
5 months ago

Write a function that finds the union of multiple sets.

Your function should take a list of sets as input and return a new set containing all unique elements from all input sets.

For example, given the following sets:

  • set1 = {1, 2, 3}
  • set2 = {3, 4, 5}
  • set3 = {5, 6, 7}

Your function should return a set containing {1, 2, 3, 4, 5, 6, 7}.

Here's another example:

  • set1 = {10, 20}
  • set2 = {20, 30, 40}
  • set3 = { }

Your function should return a set containing {10, 20, 30, 40}.

Consider edge cases such as empty sets, sets with duplicate elements, and a large number of input sets. Ensure your solution is efficient, particularly when dealing with a substantial number of elements across the input sets. How would you optimize for space complexity if memory is constrained? Discuss the time and space complexity of your solution.

Sample Answer

Finding the Union of Multiple Sets

This problem involves creating a function that efficiently finds the union of multiple sets. We will start with a naive solution and then optimize it for both time and space complexity.

Problem Description

Given a list of sets, the goal is to return a new set containing all unique elements from all input sets. The function should handle edge cases such as empty sets, sets with duplicate elements, and a large number of input sets.

Naive Solution

The simplest approach is to iterate through the list of sets and add each element to a new set. This ensures that the resulting set contains only unique elements.

def union_of_sets_naive(list_of_sets):
    result = set()
    for s in list_of_sets:
        for element in s:
            result.add(element)
    return result

# Example usage
set1 = {1, 2, 3}
set2 = {3, 4, 5}
set3 = {5, 6, 7}

print(union_of_sets_naive([set1, set2, set3]))  # Output: {1, 2, 3, 4, 5, 6, 7}

set1 = {10, 20}
set2 = {20, 30, 40}
set3 = {}

print(union_of_sets_naive([set1, set2, set3]))  # Output: {10, 20, 30, 40}

Optimal Solution

Python's set data structure provides a built-in union method that can efficiently compute the union of multiple sets. This approach is more concise and often faster than the naive solution.

def union_of_sets_optimal(list_of_sets):
    result = set()
    for s in list_of_sets:
        result = result.union(s)
    return result

# Example usage
set1 = {1, 2, 3}
set2 = {3, 4, 5}
set3 = {5, 6, 7}

print(union_of_sets_optimal([set1, set2, set3]))  # Output: {1, 2, 3, 4, 5, 6, 7}

set1 = {10, 20}
set2 = {20, 30, 40}
set3 = {}

print(union_of_sets_optimal([set1, set2, set3]))  # Output: {10, 20, 30, 40}

Alternatively, we can use the union method with *args to pass all sets at once:

def union_of_sets_optimal_args(list_of_sets):
    return set().union(*list_of_sets)


# Example usage
set1 = {1, 2, 3}
set2 = {3, 4, 5}
set3 = {5, 6, 7}

print(union_of_sets_optimal_args([set1, set2, set3]))

set1 = {10, 20}
set2 = {20, 30, 40}
set3 = {}

print(union_of_sets_optimal_args([set1, set2, set3]))

Big O Run-Time Analysis

Naive Solution

  • Time Complexity: O(N), where N is the total number of elements across all sets. This is because we iterate through each element in each set and add it to the result set. The add operation for a set is O(1) on average.

Optimal Solution

  • Time Complexity: O(N), where N is the total number of elements across all sets. The union method is generally optimized and performs similarly to iterating through all elements. The *args version is also O(N).

Big O Space Usage Analysis

Naive Solution

  • Space Complexity: O(M), where M is the number of unique elements across all sets. In the worst case, all elements are unique, and the result set will store all of them.

Optimal Solution

  • Space Complexity: O(M), where M is the number of unique elements across all sets. Similar to the naive solution, the space required is proportional to the number of unique elements.

Edge Cases

  1. Empty Sets:

    • If the input list contains empty sets, the function should still work correctly, as an empty set does not contribute any new elements to the union.
  2. Sets with Duplicate Elements:

    • The set data structure inherently handles duplicate elements, so no special treatment is needed.
  3. Large Number of Input Sets:

    • The optimal solution using the union method is efficient even with a large number of input sets because it's optimized internally.
  4. Sets with mixed data types

    • Sets can contain mixed data types (e.g., integers, strings), and the union operation should still function correctly, provided the elements are hashable.

Optimizing for Space Complexity

If memory is highly constrained, consider these approaches:

  1. In-place Union (if applicable): If modifying the original sets is acceptable and one of the sets is significantly larger, you could perform the union in-place to reduce the additional space needed.

  2. Iterative Union: Instead of creating a new set to hold all elements, process the sets one at a time and clear the elements from the sets once processed. This can reduce the space used at any given time, though it might complicate the logic and is only applicable if you don't need to retain the original sets.

  3. Using Generators (for extremely large sets): If the sets are extremely large and dynamically generated, consider using generators to process the sets in chunks, further reducing memory usage. This will depend on how the sets are being generated.