I am working on two large data sets, and my question is as follows.

Suppose I have two lists:

`list1 = [A,B,C,D]`

`list2 = [B,D,A,G]`

How can I efficiently find the matching index, using Python, other than O(n^{2}) searching? The result should look like:

`matching_index(list1,list2) -> [(0,2),(1,0),(3,1)]`

**Solution:**

## Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverse index of the first list and then traverse the second list. This traverses each list only once and thus is `O(n)`

.

```
def find_matching_index(list1, list2):
inverse_index = { element: index for index, element in enumerate(list1) }
return [(index, inverse_index[element])
for index, element in enumerate(list2) if element in inverse_index]
find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]
```

## With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple index with a `set`

.

```
def find_matching_index(list1, list2):
# Create an inverse index which keys are now sets
inverse_index = {}
for index, element in enumerate(list1):
if element not in inverse_index:
inverse_index[element] = {index}
else:
inverse_index[element].add(index)
# Traverse the second list
matching_index = []
for index, element in enumerate(list2):
# We have to create one pair by element in the set of the inverse index
if element in inverse_index:
matching_index.extend([(x, index) for x in inverse_index[element]])
return matching_index
find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]
```

Unfortunately, this is no longer *O(n)*. Consider the case where you input `[1, 1]`

and `[1, 1]`

, the output is `[(0, 0), (0, 1), (1, 0), (1, 1)]`

. Thus by the size of the output, the worst case is `O(n^2)`

.

Although, this solution is still `O(n)`

if there are not duplicates.