iTranslated by AI
[Line Segment Intersection] Data Structures and Algorithms for Building GIS [Computational Geometry]
Overview
In the previous article, we implemented "Convex Hull calculation," one of the fundamental functions of GIS. I hope it provided a glimpse into the beauty of algorithms for geometric information developed within the field of computational geometry.
From this article onward, we will consider processes tailored to more practical GIS functions. First up is "Map Overlay."
When using GIS, we often overlay several sets of map data (layers) to create new data or perform analysis. For example, one might overlay hazard maps of different disasters to examine the disaster risk of the area or town where they live.
Below is an example of the "Overlay Hazard Map" published by the Ministry of Land, Infrastructure, Transport and Tourism.
Hazard map for floods and inland waters
Hazard map for tsunamis
I intend to implement these "Map Overlay" processes.
A key technique involves representing feature polygons using a data structure called a "Doubly Connected Edge List" and using an algorithm called the "Plane Sweep Method" to efficiently solve various processes related to "Map Overlay."
In this article, I will explain the "Plane Sweep Method."
Again, I will refer to "Computational Geometry: Algorithms and Applications, 3rd Edition".
Map Overlay
Let's first consider the inputs and outputs of the algorithm while looking at a simple example of "Map Overlay."
Suppose we want to implement an overlay process using shape

Overlaying involves processes such as "Union," "Intersection," and "Difference." You can see that the resulting shapes change depending on the purpose.[1]
What we want to consider here is how to implement overlay processes like "Union," "Intersection," and "Difference."
Before thinking about "how to process," let's consider what kind of "output" would fit the objective.
The point is that, just as with the "Convex Hull" covered in the previous article, the shape (= region) after "overlaying" can be represented by a set of points.

Let's re-number the vertices of shapes
Then, the input and output of the "overlay" process can all be expressed as point sets as follows:
- Input: Shape
, ShapeP=[p_1,p_2, ..., p_5] Q=[q_1,q_2,...,q_6] - Output:
- Union: Shape
[p_1,p_2,A, q_2, ..., B, p_4,p_5] - Intersection: Shape
[q_1,A, p_3, B] - Difference: Shape
[A, q_2, ..., q_6, B, p_3]
- Union: Shape
Please note that regardless of the "overlay" method, the intersection points
Furthermore, since intersection information is not yet included in the input to the process, it is necessary to find intersection points at some stage of the "overlay" process.
This article will focus on the method for finding those "intersection points."
To make it easier to understand, let's consider the algorithm by viewing shapes as sets of multiple line segments, where intersection points occur due to intersections between line segments belonging to different shapes.
Line Segment Intersection
The Straightforward Way
In the previous section, I mentioned that in "Map Overlay" processing, it is necessary to find intersection points between different shapes (including determining whether they intersect at all).
A simple method is to determine intersections and calculate intersection coordinates for all combinations of line segments. This is what is known as the Brute Force method.
Needless to say, because this method is too simple, the computational cost for large-scale data is astronomical. This is because the complexity of the brute force method is of the order of
Map data can easily consist of thousands or tens of thousands of line segments. Therefore, an algorithm that finds them in
If you think about it, cases where line segments intersect shouldn't be that frequent. For any given segment, the number of other segments it could possibly intersect with should be very small to begin with, and we should be able to significantly reduce the scope of calculation.
What we want here is an algorithm whose execution time depends not only on the number of input segments but also on the number of intersection points. Such an algorithm is called an output-sensitive algorithm. We want to find a calculation method that eliminates as much waste as possible, and the hint is hidden in the "geometric characteristics."
Plane Sweep Algorithm
Speaking of "geometric characteristics," the first thing that comes to mind is "distance" derived from coordinates. Couldn't we think that... for line segments to intersect, they must be close to each other in the first place?
As a simple method, if we try projecting the length of the line segments—that is, their two endpoints—onto the y-axis, the pairs of line segments that could potentially intersect start to narrow down (see the figure below).
In the figure below, segments
By projecting each segment onto the y-axis, it seems we can limit the pairs of segments that might intersect in the first place! In other words, we only need to examine pairs of segments whose intervals on the y-axis overlap—that is, pairs of segments for which a horizontal line exists that intersects both segments.

Therefore, we introduce a sweep line
Specifically, the sweep line
Since the sweep line
To visualize this with the figure above, the following two event points are important for updating the sweep-line status:
- When the sweep line passes the upper endpoint of a segment.
- When the sweep line passes the lower endpoint of a segment.
In case 1), this segment is added to the sweep-line status and used for intersection determination with existing segments in the status. In case 2), this segment is removed from the sweep-line status and will no longer be used for intersection determination.
You might think this method will work well 🧐, but there are actually tricky cases where it loses its advantage over the brute-force method. For example, as shown on the left in the figure below, there are cases where segments parallel to the y-axis share ranges moderately. Even though there isn't a single intersection point, a significant number of intersection tests are required.

To address this, further improvements are made to consider horizontal proximity in intersection determination. In other words, by only checking for intersections between line segments that are adjacent in the x-axis direction, redundant processing is further reduced.
Consequently, the sweep-line status is changed from a simple set of segments to one that can maintain the adjacency relationships between segments.
And for the timing of updating the sweep-line status, the following three event points are important:
- When the sweep line passes the upper endpoint of a segment.
- When the sweep line passes the lower endpoint of a segment.
- (New) When the sweep line passes an intersection point.
Note that we are now considering the third one, the "intersection point," as an event point.
Specifically: in 1), this segment is added to the status, the adjacency relationships with existing segments are updated, and then it is used for intersection determination. In 2), this segment is removed from the status, and adjacency relationships among the remaining segments are re-updated. In 3), since the left-right relationship of the two segments forming the intersection point flips after passing the point, the adjacency relationships in the status are updated.
This seems like it will work. Let's organize these and prepare the necessary data structures and algorithms.
Data Structures and Algorithms
Event Queue and Sweep-line Status Structure
Now, let's define the necessary data structures.
As mentioned, the timing for updating the sweep-line status in the plane sweep method resides in the event points. In other words, the event queue stores these event points and manages the triggers for updating the sweep-line status.
Queue
For example, if the sweep line moves from the positive y-direction to the negative y-direction, queue
Furthermore, event points may increase during the processing of the plane sweep algorithm—specifically, when an intersection is discovered. As previously noted, intersection points carry important weight as event points.
Therefore, we need a mechanism to add or update intersection points in queue
In the queue, let's manage not only the point coordinates, but also which line segment the point belongs to, and whether that point is a start point, end point, or intersection point. This information will be crucial for the operations of the status structure tree defined later.
In Python, I recommend using the heapq library, which allows for easy implementation of a priority queue. I'll leave the detailed explanation of the heapq library itself to the following articles.
Event Queue Implementation Example
from heapq import heappush, heappop, heapify
class EmptyQueueException(Exception):
def __str__(self):
return "Cannot pop an empty queue"
class PriorityQueue:
'''
A queue that manages each event point and supports sequential processing.
'''
def __init__(self):
self.queue = []
self.dup = set()
def __str__(self):
return '[' + ' '.join([str(i) for i in self.queue]) + ']'
def isEmpty(self):
return len(self.queue) == 0
def pushAll(self, l):
for a in l:
self.push(a)
# For inserting an element in the queue
def push(self, e):
if e.status != "int": # If it's not an intersection, simply add it to the queue
heappush(self.queue, e)
elif e not in self.dup: # If it's an intersection and not yet registered, add it (avoiding duplicate registration of the same event point from different segments)
heappush(self.queue, e)
self.dup.add(e)
def __len__(self):
return len(self.queue)
def pop(self):
if self.isEmpty():
raise EmptyQueueException()
else:
return heappop(self.queue)
def __iter__(self):
"""
Do ascending iteration for TreeSet
"""
for element in self.queue:
yield element
def clear(self):
self.queue = []
Next is the status structure
As the term adjacency relationship suggests, we need to record two pieces of information: the segment to the left and the segment to the right of a given segment. For this, a balanced binary search tree is useful.
The timing for updating this tree is the same as when an event point is retrieved from the event queue
- When the event point is the start point of a segment: Add the segment
starting at that point to the status structures .T - When the event point is the end point of a segment: Remove the segment
ending at that point from the status structures .T - When the event point is an intersection between segments: Swap the positions of the two segments forming that intersection within the status structure
.T
In any of these processes, the adjacency relationships in the tree change, and we perform intersection tests between the segment related to the event point and its newly adjacent segments. If an intersection is found, it is stored in the event queue
For the balanced binary search tree, using the bisect library in Python is a good choice.
Again, I'll leave the library explanation to the following articles.
Status Structure Implementation Example
import bisect
class TreeSet(object):
"""
A binary tree data structure using bisect.
Manages the relative order of each segment.
"""
def __init__(self): # Initialization
self._treeset = []
self.pop_number = 0
self.push_number = 0
def isEmpty(self): # Check if empty
return len(self) == 0
def addAll(self, elements): # Add elements if they are not in the treeset
for element in elements:
if element in self: continue
self.add(element)
def add(self, element):
if self.isEmpty() or element not in self:
bisect.insort_right(self._treeset, element)
self.push_number += 1
def add_high_low(self, element): # Add a new segment to the tree
if element not in self: # If not yet registered
index = bisect.bisect_right(self._treeset, element)
self._treeset.insert(index, element) # New insertion
low = self._treeset[index - 1] if index > 0 else None
high = self._treeset[index + 1] if index < len(self._treeset) - 1 else None
self.push_number += 1 # Update tree size
return low, high # Return the adjacent segments
return None
def push(self, element):
self.add(element)
def pushAll(self, l):
for a in l:
self.push(a)
def pop(self):
if not self.isEmpty():
res = self._treeset[0]
self._treeset = self._treeset[1:]
return res
return None
def higher(self, e): # Return the smallest element in the treeset that is larger than the given value (i.e., segment e's neighbor)
index = bisect.bisect_right(self._treeset, e)
if index < self.__len__():
return self[index]
else:
return None
def lower(self, e): # Return the largest element in the treeset that is smaller than the given value (i.e., segment e's neighbor)
index = bisect.bisect_left(self._treeset, e)
if index > 0:
return self[index - 1]
else:
return None
def swap(self, e1, e2): # Swap elements within the treeset
i1 = self._treeset.index(e1)
i2 = self._treeset.index(e2)
self._treeset[i1] = e2
self._treeset[i2] = e1
def __getitem__(self, num): # treeset[i]
return self._treeset[num]
def __len__(self):
return len(self._treeset)
def clear(self):
"""
Empty the tree
"""
self._treeset = []
def remove(self, element):
"""
Remove the specified element from the tree
"""
try:
self._treeset.remove(element)
except ValueError:
return False
return True
def __iter__(self):
"""
Iteration process
"""
for element in self._treeset:
yield element
def __str__(self):
return '[' + ' '.join([str(i) for i in self._treeset]) + ']'
def __eq__(self, target):
if isinstance(target, TreeSet):
return self._treeset == target._treeset
elif isinstance(target, list):
return self._treeset == target
def __contains__(self, e): # Process for 'e in self'
"""
Fast attribution judgment by bisect
"""
try:
return e == (self._treeset[bisect.bisect_left(self._treeset, e)])
except:
return False
Pseudocode
Algorithm:
Input: A set
Output: The set of intersection points among the line segments in
- Initialize an empty event queue
. Next, insert the endpoints of the line segments intoQ . When inserting the upper endpoint of a segment, store the corresponding segment along with it.Q - Initialize an empty status structure
.T - while
is not empty:Q - ---- do Determine the next event point
inp and delete it.Q - ----
HandleEventPoint(p)
Algorithm:
- Let
be the set of segments whose upper endpoint isU(p) ; these segments are stored along with the event pointp .p - Find all segments containing
inp . These are adjacent inT . LetT be the set of segments found so far whose lower endpoint isL(p) , and letp be the set of segments found so far that containC(p) in their interior.p - if
contains more than one segment:L(p) \cup U(p) \cup C(p) - ---- then Report
as an intersection point, and outputp as well.L(p), U(p), C(p) - Remove the segments in
fromL(p) \cup C(p) .T - Insert the segments in
intoU(p) \cup C(p) . The order of segments inT must correspond to the order in which they intersect the sweep line immediately belowT .p - (Deleting and then re-inserting the segments in
reverses their order)C(p) - if
U(p) \cup C(p) = 0 - ---- then Let
be the segments immediately to the left and right ofs_l, s_r inp .T - --------
FindNewEvent(s_l, s_r, p) - ---- else Let
be the leftmost segment ins' inU(p) \cup C(p) .T - -------- Let
be the segment to the left ofs_l ins' .T - --------
FindNewEvent(s_l, s', p) - -------- Let
be the rightmost segment ins'' inU(p) \cup C(p) .T - -------- Let
be the segment to the right ofs_r ins'' .T - --------
FindNewEvent(s'', s_r, p)
Algorithm:
- if
ands_l intersect below the sweep line, or on the sweep line to the right of the current event points_r , and the intersection point is not already inp :Q - ---- then Insert this intersection point as an event point into
.Q
Concrete Example
Finally, let's visualize how the event queue
We will find the intersection points for the set of line segments shown on the left in the figure below. The contents of the event queue
At sweep line position
When moving to sweep line position
What is important here is that at this stage, segments
Subsequently, when moving to sweep line position

Conclusion
In this article, I explained the "Plane Sweep Algorithm," an algorithm for finding the intersection points necessary for "Map Overlay" processing. In the next article, we will use the "Plane Sweep Algorithm" to finally extend it to full "Map Overlay" processing.
P.S.
I plan to release the program code if necessary. I avoided including it in the article because it is quite long. In any case, things have already become quite advanced by Chapter 2, and it's getting tough 🥹.
-
These are also familiar in drawing tools like Illustrator. ↩︎
-
Actually, they don't intersect, but it's fine to keep them as candidates for possibility. ↩︎
-
Thus, rather than moving the sweep line
itself at a constant interval, it is actually a discrete movement where the sweep linel "warps" to the height of each segment's endpoint. ↩︎l -
Note that the IDs of the stored endpoints are sorted according to the sweep line's position, and endpoints belonging to multiple segments are stored multiple times. Even with the same endpoint ID, they can be distinguished because the segments they belong to and their status as start or end points are different. ↩︎


Discussion