Rabi Siddique
983 words
5 minutes
Union-Find

Union-Find, also known as Disjoint Set Union (DSU), is a set of algorithms used to solve the dynamic connectivity problem. This problem involves maintaining connections or relationships between a set of N objects. Two fundamental operations in Union-Find are:

  • Union: This operation connects two objects.
  • Find: This operation checks whether there exists a path connecting two objects.

Problem Statement#

Consider a collection of objects and various connections between them. Union-Find helps us determine if two objects are connected directly or indirectly through a chain of connections.

Consider these Graphs:

Graphs

connection(A, F): The connection query returns False as there is no direct or indirect connection between A and F.

connection(C, E) The connection returns True. Although there C and E are not connected directly. But there exist a path from C to E.

union(A, C): This will create a connection between A and C. Giving us this graph:

Graphs

Applications of Union Find#

  • Social Networks: Modeling connections between friends or users in a social network.
  • Image Processing: Managing pixel connectivity in digital photos for tasks like image segmentation.
  • Symbol Tables: Keeping track of variable names in a program.

When programming, its convenient to name objects from 0 to N-1. Then we use integers as array index.

Mathematical Properties Maintained by Union-Find#

Union-Find maintains the following mathematical properties for the "is connected to" relationship, treating it as an equivalence relation:

  • Reflexive: Any object is connected to itself (p is connected to p).
  • Symmetric: If object p is connected to object q, then object q is also connected to object p.
  • Transitive: If p is connected to q and q is connected to r, then p is connected to r, forming a transitive relationship.

Connected Components#

Connected components are a set of objects that are mutually connected. When any two objects within a connected component are connected, it implies that there are no external objects or connections linking to them. In other words, connected components represent self-contained groups of objects with no connections extending beyond these groups.

Implementation#

Design a Data Strucutre for Union Find#

  • Number of objects N can be huge.
  • Number of operations M can be huge.
  • Find Queries and Union commands may be intermixed.
class UF:
    # Initialize UF data structure with N Objects(0 to N-1)
    def __init__(self, N: int):

    # Add connection between p and q
    def union(self, p: int, q: int) -> None:
        pass

    # Are p and q in the same component
    def connected(self, p: int, q: int) -> bool:
        pass

    # Component identifier for p(0 to N-1)
    def find(self, p: int) -> int:
        pass

    # Number of components
    def count() -> int:
        pass

Quick Find Implementation#

Data Structure#

  • Integer array of size N.
  • Interpretation: p and q are connected if and only if they have the same id.

Consider these graphs representing our 10 objects:

Graphs

This is how our ID array will look like:

0123456789
0118800189

Based on the array:

  • 0, 5, and 6 are a connected component.
  • 1, 2, and 7 are a connected component.
  • 3, 4, and 8 are a connected component.
  • 9 is its own component.

Find#

Check if p and q have the same ID. If yes, they are a connected component.

Union#

To merge components containing p and q, change all entries whose ID equals id[p] to id[q]. This can be a problem if we have large number of objects.

Code in Python for Quick Find#

class QuickFind:

    def __init__(self, N):
        # Set ID of each object to itself
        self.id = []

        for i in range(N):
            id[i] = i

    # Check if p and q are in the same component
    def isConnected(self, p, q):
        return self.id[p] == self.id[q]

    # Change all entries with id[p] to id[q]
    def union(self, p, q):

        pid = self.id[p]
        qid = self.id[q]

        for i in range(len(self.id)):
            if self.id[i] == pid:
                self.id[i] = qid

Cost Model for Quick Find#

Number of array access(for read or write):

  • Initialize: N
  • Union: N
  • find: 1

Quick-find defect: Union is too expensive.

Example Takes N^2 array access to process of N union commands on N objects It’s quadratic. Quadratic algorithms don’t scale with technology.

Quick Union Implementation#

Data Structure#

  • Integer array of size N.
  • Interpretation: id[i] is the parent of i.
  • Root of i is id[id[id[.....id[i]....]]].

Consider these graphs:

Graphs Graphs

This is how our ID array will look like:

0123456789
0194966789

Find Operation#

Check if elements p and q have the same root. To find the root of an element, like 3, you would use: id[id[id[3]]], which results in 9.

If the roots are the same, p and q are part of the same connected component.

Union Operation#

To merge components containing p and q, set the id of p’s root to the id of q’s root. This operation involves changing one entry in the array, unlike Quick Find, which requires changing all entries where the root equals p.

Cost Model for Quick Union#

Number of array accesses (for read or write):

  • Initialize: N
  • Union: N
  • Find: N

Defect of Quick-Union: Trees can get tall, making the find operation potentially expensive (up to N array accesses).

Improvements in Quick Union ==> Weighted Quick Union#

  • Modify Quick Union to avoid tall trees by keeping track of the size of each tree (number of objects).
  • Balance by linking the root of the smaller tree to the root of the larger tree.

Data Structure#

  • Uses the same structure as quick union but maintains an additional size array to count the number of objects in the tree rooted at each node.
  • Find operation remains the same as in quick union.
  • Modify union operation to:
    • Link the root of the smaller tree to the root of the larger tree.
    • Update the size array.

Runtime#

  • Find: Time proportional to the depth of p and q.
  • Union: Constant time given the roots. Depth of any node is at most logN.

Examples of operation counts:

  • For N = 1,000: ~10 operations.
  • For N = 1,000,000: ~20 operations.
  • For N = 1,000,000,000: ~30 operations.

Quick Find:

  • Union runtime: N
  • Connected runtime: Constant

Quick Union:

  • Union runtime: N
  • Connected runtime: N

Weighted Quick Union:

  • Union runtime: LogN
  • Connected runtime: LogN

Further Performance Improvement ==> Path Compression#

After computing the root of p, set the id of each examined node to point directly to that root.

  • Two-pass implementation: Add a second loop to the root method to set the id[] of each examined node to the root.
  • Simple one-pass variant: Make every other node in the path point to its grandparent, thereby halving the path length.
Union-Find
https://rabisiddique.com/posts/union-find/
Author
Rabi Siddique
Published at
2023-10-24