iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🛠️

[Line Segment Intersection] Data Structures and Algorithms for Building GIS [Computational Geometry]

に公開

Overview

In the previous article, we implemented "Convex Hull calculation," one of the fundamental functions of GIS. I hope it provided a glimpse into the beauty of algorithms for geometric information developed within the field of computational geometry.

From this article onward, we will consider processes tailored to more practical GIS functions. First up is "Map Overlay."

When using GIS, we often overlay several sets of map data (layers) to create new data or perform analysis. For example, one might overlay hazard maps of different disasters to examine the disaster risk of the area or town where they live.

Below is an example of the "Overlay Hazard Map" published by the Ministry of Land, Infrastructure, Transport and Tourism.

Hazard map for floods and inland waters

Hazard map for tsunamis

I intend to implement these "Map Overlay" processes.
A key technique involves representing feature polygons using a data structure called a "Doubly Connected Edge List" and using an algorithm called the "Plane Sweep Method" to efficiently solve various processes related to "Map Overlay."

In this article, I will explain the "Plane Sweep Method."
Again, I will refer to "Computational Geometry: Algorithms and Applications, 3rd Edition".

https://www.kindaikagaku.co.jp/book_list/detail/9784764903883/

Map Overlay

Let's first consider the inputs and outputs of the algorithm while looking at a simple example of "Map Overlay."
Suppose we want to implement an overlay process using shape P and shape Q as shown in the figure below.

Overlaying involves processes such as "Union," "Intersection," and "Difference." You can see that the resulting shapes change depending on the purpose.[1]

What we want to consider here is how to implement overlay processes like "Union," "Intersection," and "Difference."
Before thinking about "how to process," let's consider what kind of "output" would fit the objective.

The point is that, just as with the "Convex Hull" covered in the previous article, the shape (= region) after "overlaying" can be represented by a set of points.

Let's re-number the vertices of shapes P and Q, and let the intersection points of the two shapes be A and B.
Then, the input and output of the "overlay" process can all be expressed as point sets as follows:

  • Input: Shape P=[p_1,p_2, ..., p_5], Shape Q=[q_1,q_2,...,q_6]
  • Output:
    • Union: Shape [p_1,p_2,A, q_2, ..., B, p_4,p_5]
    • Intersection: Shape [q_1,A, p_3, B]
    • Difference: Shape [A, q_2, ..., q_6, B, p_3]

Please note that regardless of the "overlay" method, the intersection points A and B of the two shapes may appear. In overlay processes, intersection points between different shapes are crucial information.

Furthermore, since intersection information is not yet included in the input to the process, it is necessary to find intersection points at some stage of the "overlay" process.

This article will focus on the method for finding those "intersection points."
To make it easier to understand, let's consider the algorithm by viewing shapes as sets of multiple line segments, where intersection points occur due to intersections between line segments belonging to different shapes.

Line Segment Intersection

The Straightforward Way

In the previous section, I mentioned that in "Map Overlay" processing, it is necessary to find intersection points between different shapes (including determining whether they intersect at all).

A simple method is to determine intersections and calculate intersection coordinates for all combinations of line segments. This is what is known as the Brute Force method.

Needless to say, because this method is too simple, the computational cost for large-scale data is astronomical. This is because the complexity of the brute force method is of the order of O(n^2).

Map data can easily consist of thousands or tens of thousands of line segments. Therefore, an algorithm that finds them in O(n^2) will likely be unbearable.

If you think about it, cases where line segments intersect shouldn't be that frequent. For any given segment, the number of other segments it could possibly intersect with should be very small to begin with, and we should be able to significantly reduce the scope of calculation.

What we want here is an algorithm whose execution time depends not only on the number of input segments but also on the number of intersection points. Such an algorithm is called an output-sensitive algorithm. We want to find a calculation method that eliminates as much waste as possible, and the hint is hidden in the "geometric characteristics."

Plane Sweep Algorithm

Speaking of "geometric characteristics," the first thing that comes to mind is "distance" derived from coordinates. Couldn't we think that... for line segments to intersect, they must be close to each other in the first place?

As a simple method, if we try projecting the length of the line segments—that is, their two endpoints—onto the y-axis, the pairs of line segments that could potentially intersect start to narrow down (see the figure below).

In the figure below, segments A and B overlap on the y-axis, so there's a possibility of intersection![2] Conversely, segments A and C do not overlap on the y-axis, so there's likely no possibility of intersection from the start.

By projecting each segment onto the y-axis, it seems we can limit the pairs of segments that might intersect in the first place! In other words, we only need to examine pairs of segments whose intervals on the y-axis overlap—that is, pairs of segments for which a horizontal line exists that intersects both segments.

Therefore, we introduce a sweep line l that moves from a position higher than all segments to a lower position, and consider a process that investigates pairs of segments whose intervals on the y-axis overlap. This type of algorithm is called the Plane Sweep algorithm.

Specifically, the sweep line l maintains a set of line segments intersecting with it as a state (called the sweep-line status) and performs intersection determination only among the segments contained in that set.

Since the sweep line l moves along the y-axis, the sweep-line status is also updated sequentially. However, the timing of updates does not need to be continuous; it only needs to occur when it coincides with the endpoints of segments. These update points are called event points.[3]

To visualize this with the figure above, the following two event points are important for updating the sweep-line status:

  1. When the sweep line passes the upper endpoint of a segment.
  2. When the sweep line passes the lower endpoint of a segment.

In case 1), this segment is added to the sweep-line status and used for intersection determination with existing segments in the status. In case 2), this segment is removed from the sweep-line status and will no longer be used for intersection determination.

You might think this method will work well 🧐, but there are actually tricky cases where it loses its advantage over the brute-force method. For example, as shown on the left in the figure below, there are cases where segments parallel to the y-axis share ranges moderately. Even though there isn't a single intersection point, a significant number of intersection tests are required.

To address this, further improvements are made to consider horizontal proximity in intersection determination. In other words, by only checking for intersections between line segments that are adjacent in the x-axis direction, redundant processing is further reduced.

Consequently, the sweep-line status is changed from a simple set of segments to one that can maintain the adjacency relationships between segments.

And for the timing of updating the sweep-line status, the following three event points are important:

  1. When the sweep line passes the upper endpoint of a segment.
  2. When the sweep line passes the lower endpoint of a segment.
  3. (New) When the sweep line passes an intersection point.

Note that we are now considering the third one, the "intersection point," as an event point.

Specifically: in 1), this segment is added to the status, the adjacency relationships with existing segments are updated, and then it is used for intersection determination. In 2), this segment is removed from the status, and adjacency relationships among the remaining segments are re-updated. In 3), since the left-right relationship of the two segments forming the intersection point flips after passing the point, the adjacency relationships in the status are updated.

This seems like it will work. Let's organize these and prepare the necessary data structures and algorithms.

Data Structures and Algorithms

Event Queue and Sweep-line Status Structure

Now, let's define the necessary data structures.
As mentioned, the timing for updating the sweep-line status in the plane sweep method resides in the event points. In other words, the event queue stores these event points and manages the triggers for updating the sweep-line status.

Queue Q stores all the endpoints of the line segments as event points. At this time, it is convenient to automatically sort these endpoints according to the direction of the sweep line.

For example, if the sweep line moves from the positive y-direction to the negative y-direction, queue Q is managed so that endpoints with larger y-coordinates come first. When y-coordinates are the same, let's clearly define the ordering, such as prioritizing smaller x-coordinates.

Furthermore, event points may increase during the processing of the plane sweep algorithm—specifically, when an intersection is discovered. As previously noted, intersection points carry important weight as event points.
Therefore, we need a mechanism to add or update intersection points in queue Q as event points. Naturally, we also implement a process to remove handled endpoints or intersection points from the queue.

In the queue, let's manage not only the point coordinates, but also which line segment the point belongs to, and whether that point is a start point, end point, or intersection point. This information will be crucial for the operations of the status structure tree defined later.

In Python, I recommend using the heapq library, which allows for easy implementation of a priority queue. I'll leave the detailed explanation of the heapq library itself to the following articles.

https://qiita.com/ell/items/fe52a9eb9499b7060ed6
https://qiita.com/T_Wakasugi/items/dac6eb77a3cace54f95e

Event Queue Implementation Example
from heapq import heappush, heappop, heapify

class EmptyQueueException(Exception):
    def __str__(self):
        return "Cannot pop an empty queue"

class PriorityQueue:
    '''
    A queue that manages each event point and supports sequential processing.
    '''
    def __init__(self):
        self.queue = []
        self.dup = set()

    def __str__(self):
        return '[' + ' '.join([str(i) for i in self.queue]) + ']'

    def isEmpty(self):
        return len(self.queue) == 0

    def pushAll(self, l):
        for a in l:
            self.push(a)

    # For inserting an element in the queue
    def push(self, e):
        if e.status != "int": # If it's not an intersection, simply add it to the queue
            heappush(self.queue, e)

        elif e not in self.dup: # If it's an intersection and not yet registered, add it (avoiding duplicate registration of the same event point from different segments)
            heappush(self.queue, e)
            self.dup.add(e)

    def __len__(self):
        return len(self.queue)

    def pop(self):
        if self.isEmpty():
            raise EmptyQueueException()
        else:
            return heappop(self.queue)

    def __iter__(self):
        """
        Do ascending iteration for TreeSet
        """
        for element in self.queue:
            yield element

    def clear(self):
        self.queue = []

Next is the status structure T, which records the adjacency relationships of the line segments.
As the term adjacency relationship suggests, we need to record two pieces of information: the segment to the left and the segment to the right of a given segment. For this, a balanced binary search tree is useful.

The timing for updating this tree is the same as when an event point is retrieved from the event queue Q; the processing branches based on the meaning of the retrieved event point.

  1. When the event point is the start point of a segment: Add the segment s starting at that point to the status structure T.
  2. When the event point is the end point of a segment: Remove the segment s ending at that point from the status structure T.
  3. When the event point is an intersection between segments: Swap the positions of the two segments forming that intersection within the status structure T.

In any of these processes, the adjacency relationships in the tree change, and we perform intersection tests between the segment related to the event point and its newly adjacent segments. If an intersection is found, it is stored in the event queue Q.

For the balanced binary search tree, using the bisect library in Python is a good choice.
Again, I'll leave the library explanation to the following articles.
https://qiita.com/seigot/items/5091499d75a11cc77e41
https://qiita.com/T_Wakasugi/items/c979e977f56531942de4

Status Structure Implementation Example
import bisect

class TreeSet(object):
    """
    A binary tree data structure using bisect.
    Manages the relative order of each segment.
    """
    def __init__(self): # Initialization
        self._treeset    = []
        self.pop_number  = 0
        self.push_number = 0

    def isEmpty(self):  # Check if empty
        return len(self) == 0

    def addAll(self, elements): # Add elements if they are not in the treeset
        for element in elements:
            if element in self: continue
            self.add(element) 

    def add(self, element):
        if self.isEmpty() or element not in self:
            bisect.insort_right(self._treeset, element)
            self.push_number += 1

    def add_high_low(self, element): # Add a new segment to the tree
        if element not in self: # If not yet registered
            index = bisect.bisect_right(self._treeset, element)
            self._treeset.insert(index, element) # New insertion

            low = self._treeset[index - 1] if index > 0 else None
            high = self._treeset[index + 1] if index < len(self._treeset) - 1 else None
            self.push_number += 1 # Update tree size
            return low, high # Return the adjacent segments

        return None

    def push(self, element):
        self.add(element)

    def pushAll(self, l):
        for a in l:
            self.push(a)

    def pop(self):
        if not self.isEmpty():
            res = self._treeset[0]
            self._treeset = self._treeset[1:]
            return res
        return None

    def higher(self, e): # Return the smallest element in the treeset that is larger than the given value (i.e., segment e's neighbor)
        index = bisect.bisect_right(self._treeset, e)
        if index < self.__len__():
            return self[index]
        else:
            return None

    def lower(self, e): # Return the largest element in the treeset that is smaller than the given value (i.e., segment e's neighbor)

        index = bisect.bisect_left(self._treeset, e)
        if index > 0:
            return self[index - 1]
        else:
            return None

    def swap(self, e1, e2): # Swap elements within the treeset
        i1                = self._treeset.index(e1)
        i2                = self._treeset.index(e2)
        self._treeset[i1] = e2
        self._treeset[i2] = e1

    def __getitem__(self, num): # treeset[i]
        return self._treeset[num]

    def __len__(self):
        return len(self._treeset)

    def clear(self):
        """
        Empty the tree
        """
        self._treeset     = []

    def remove(self, element):
        """
        Remove the specified element from the tree
        """
        try:
            self._treeset.remove(element)
        except ValueError:
            return False
        return True

    def __iter__(self):
        """
        Iteration process
        """
        for element in self._treeset:
            yield element

    def __str__(self):
        return '[' + ' '.join([str(i) for i in self._treeset]) + ']'

    def __eq__(self, target):
        if isinstance(target, TreeSet):
            return self._treeset == target._treeset
        elif isinstance(target, list):
            return self._treeset == target

    def __contains__(self, e): # Process for 'e in self'
        """
        Fast attribution judgment by bisect
        """
        try:
            return e == (self._treeset[bisect.bisect_left(self._treeset, e)])
        except:
            return False

Pseudocode


Algorithm: FindIntersections(S)
Input: A set S of line segments on a plane.
Output: The set of intersection points among the line segments in S.

  1. Initialize an empty event queue Q. Next, insert the endpoints of the line segments into Q. When inserting the upper endpoint of a segment, store the corresponding segment along with it.
  2. Initialize an empty status structure T.
  3. while Q is not empty:
  4. ---- do Determine the next event point p in Q and delete it.
  5. ---- HandleEventPoint(p)

Algorithm: HandleEventPoint(p)

  1. Let U(p) be the set of segments whose upper endpoint is p; these segments are stored along with the event point p.
  2. Find all segments containing p in T. These are adjacent in T. Let L(p) be the set of segments found so far whose lower endpoint is p, and let C(p) be the set of segments found so far that contain p in their interior.
  3. if L(p) \cup U(p) \cup C(p) contains more than one segment:
  4. ---- then Report p as an intersection point, and output L(p), U(p), C(p) as well.
  5. Remove the segments in L(p) \cup C(p) from T.
  6. Insert the segments in U(p) \cup C(p) into T. The order of segments in T must correspond to the order in which they intersect the sweep line immediately below p.
  7. (Deleting and then re-inserting the segments in C(p) reverses their order)
  8. if U(p) \cup C(p) = 0
  9. ---- then Let s_l, s_r be the segments immediately to the left and right of p in T.
  10. -------- FindNewEvent(s_l, s_r, p)
  11. ---- else Let s' be the leftmost segment in U(p) \cup C(p) in T.
  12. -------- Let s_l be the segment to the left of s' in T.
  13. -------- FindNewEvent(s_l, s', p)
  14. -------- Let s'' be the rightmost segment in U(p) \cup C(p) in T.
  15. -------- Let s_r be the segment to the right of s'' in T.
  16. -------- FindNewEvent(s'', s_r, p)

Algorithm: FindNewEvent(s_l, s_r, p)

  1. if s_l and s_r intersect below the sweep line, or on the sweep line to the right of the current event point p, and the intersection point is not already in Q:
  2. ---- then Insert this intersection point as an event point into Q.

Concrete Example

Finally, let's visualize how the event queue Q and the status structure T change within the plane sweep algorithm.

We will find the intersection points for the set of line segments shown on the left in the figure below. The contents of the event queue Q and the status structure T when the sweep line is at positions l, l', l'' are shown on the right.

At sweep line position l, which is the stage before the process begins, the event queue Q stores the endpoints of the line segments that make up the segment set as event points.[4]

When moving to sweep line position l', endpoints 4 and 6 come out of the event queue Q, and in the status structure T, the adjacency relationship among the segment set s_1, s_2, s_5, s_6, s_7 containing these endpoints is constructed on the tree. After that, intersection tests are performed between adjacent segments. As a result, intersection point 9 is found between segments s_1 and s_2 and stored in the event queue Q.

What is important here is that at this stage, segments s_1 and s_5 are not yet adjacent, so intersection point 10 has not been found yet.

Subsequently, when moving to sweep line position l'', the relative positions of segments s_1 and s_2 swap, making segments s_1 and s_5 adjacent, and intersection point 10 is discovered.

Conclusion

In this article, I explained the "Plane Sweep Algorithm," an algorithm for finding the intersection points necessary for "Map Overlay" processing. In the next article, we will use the "Plane Sweep Algorithm" to finally extend it to full "Map Overlay" processing.

P.S.
I plan to release the program code if necessary. I avoided including it in the article because it is quite long. In any case, things have already become quite advanced by Chapter 2, and it's getting tough 🥹.

脚注
  1. These are also familiar in drawing tools like Illustrator. ↩︎

  2. Actually, they don't intersect, but it's fine to keep them as candidates for possibility. ↩︎

  3. Thus, rather than moving the sweep line l itself at a constant interval, it is actually a discrete movement where the sweep line l "warps" to the height of each segment's endpoint. ↩︎

  4. Note that the IDs of the stored endpoints are sorted according to the sweep line's position, and endpoints belonging to multiple segments are stored multiple times. Even with the same endpoint ID, they can be distinguished because the segments they belong to and their status as start or end points are different. ↩︎

Discussion