Problem: Given two sorted arrays of numbers, determine the median of the two sorted arrays. The overall runtime complexity must be O(log(min(m,n))).
If we were to remove the runtime complexity requirement, there would be several solutions to this problem that could work. A first thought might be to merge the two arrays, then find the median of the merged array. The issue with this strategy is that, although determining where to insert into the array is O(log(n)) with a binary search, the actual act of inserting into an array is O(n), so this goes over our required time complexity.
Seeing this, we now understand that we can’t combine the arrays in any way while maintaining an O(log(min(m+n))) time complexity. In order to solve this problem, we will need to think a little more creatively.
First, let’s think about exactly what a median of two arrays is giving us. A median is a midpoint in a set of numbers, such that half the numbers are on one side, and the other half are on the other side. More importantly, with sorted arrays, a median will have the property that everything on the left-hand side will be smaller than everything on the right-hand side. This is because if a median splits an array in half, and the array is sorted, naturally, all the smaller elements will be on one side, and all the larger will be on the other side.
This fact is the key to the solution we need for this problem. Let’s consider a simple example of two arrays:
A1 = [1,2,3,4,10]
A2 = [5,6,7,8,9]
If we wanted to find the median of the array, we would need to first get the array into an order such that all elements are sorted. This would give us a result like this:
A = [1,2,3,4,5,6,7,8,9,10]
From here, we calculate the median. Since it is an even number of elements, we use the formula (5+6)/2 = 5.5. We are just taking the two elements in the middle, 5 and 6, and taking their average. If the array was of odd size, we could just pick the middle element.
From this example, we can take a few facts that will help us solve the problem.
- We need to take n elements from the first array A1. This means that there would be (len(A1) + len(A2) + 1)/2 — n elements to pick from the second array. This formula is just representing the number of elements required to create an equally sized array, so that our median uses all the elements available.
- We need to ensure that the left-hand side of the median is less than the right-hand side of the median. Since the arrays are sorted, we can do this by comparing the largest value on the left to the smallest value on the right. If the property is satisfied for these values, then our median choice is valid.
- If the property for 2 is not satisfied, we simply pick a new point to partition, and try again.
Let’s take a look at the code to get an intuition on how we can program this, and how we can derive the time complexity from the algorithm.
To start, we want a1 to be holding the shortest array out of the two. This is done to minimize the area that we need to search.
In addition to this, we also set up variables to hold the values of the length of each array, as well as a low which will start at the beginning of the array, and a high which is the length of the smallest array. From here, we will iterate while the low is less than or equal to the high.
We are going to calculate our partition as discussed earlier. The partition of the first array will be the (low + high)/2, as the midpoint of the first array. The partition of the second array will be (len1 + len2 + 1)/2 — paritition1, to gather the remaining elements from the first partition.
From here, we are going to gather the values at the minimum and maximums of the arrays after the partition. To find the max of the first array, we look at the value that exists at the end of the partition, as long as the partition is bigger than 0. Recall that the first array is holding all values smaller than the median, and we want the largest of these values to be smaller than the smallest of array two’s values.
We then find the maximums and minimums in a similar fashion, either taking the start of the array for minimums, or the end of the array for maximums. In any instance where we can’t find a largest or smallest values due to the partition, we set a default value as the smallest or largest value.
From here, we just need to check the maximums and minimums that we found to determine if we need to partition again, or have a solution.
If we have a situation where the max of array 1 is smaller than the min of array 2, and the min of array 1 is larger than the max of array 2, we have a solution. From here we just calculate the median based on the partition and we are done. Otherwise, we move partition 1 to the left and right of its current position and start again.
This code gives us a good solution to the problem, so the final thing to discuss is the time complexity. I’ve asserted that this satisfies the condition of a O(log(min(m,n))) solution, where m and n are the length of the first and second array respectively. Let’s prove that this is true.
Consider that the search space for this problem is either the length of m or the length of n, depending on which is smaller. This means that in a typical case, we would iterate min(m,n) times, depending on which array is selected. Now consider how we increment low and high, to try to meet the condition that low <= high. To do this, we will either set high = partition1–1 or low = partition1 + 1. This means that on average, we are moving our current index by (low + high)/2. Since our search space is halving each time, we end up in the typical example of logarithmic time complexity. Each iteration halves the search space of min(m,n), therefore we have a time complexity of O(log(min(m,n))).