Find Missing Numbers, Really Fast

Philip Treacy

February 19, 2015

A while ago I wrote a routine using the .Find method to find missing numbers in a sequence. So if we had 1, 3, 5, the missing numbers are 2 and 4.

I had some feedback that as the sequence grew longer, the .Find method became increasingly slower, to the point that your hair grew quicker.

I am surprised to hear this as I’d expect a routine provided by Microsoft to be as least as fast as anything us mortals could write.

I chose to use .Find really for its ease of use, I wouldn’t have to write any loops to check each number. But when the length of time it takes to list missing numbers becomes frustratingly long, it’s time to look for another way.

One of our readers provided some code (thanks Bryan) where he compared two ways of listing missing numbers in a sequence, the first using .Find and the second using loops to check a collection created from the number sequence.

What the code revealed was that by the time you got to 10,0000 numbers in the sequence, the difference between the two approaches was already noticeable .Find was taking 50+ seconds, whereas looping through a collection took less than 2 seconds.

I then received a message from another blog reader, Dennis, saying he was trying to find missing numbers in a 90,000+ long sequence of numbers, with .Find taking over half an hour

So, I thought I’d better check this out myself and thanks to the code Bryan provided I came up with this.

What The Code Does

  • Find missing numbers (integers) in a sequence
  • The lowest number in the sequence is the starting point, the highest number is the end point
  • Can include negative integers
  • Numbers can be repeated
  • No sorting of the sequence is required
  • The sequence (input range) can contain empty cells
  • The input range can be non-contiguous

Testing

The purpose of this testing was to compare the .Find method against an approach looping through a collection created from the range of cells containing the number sequence.

Whether or not this approach is totally scientific isn’t really important, what is important is that the two approaches are compared in the same way. Apples with apples.

The most important thing is what you, or the user, experiences.

If one approach is lightning fast and the other is slower than a snail going uphill, then we don’t really need to make sure our testing is accurate down to the microsecond.

That said, the testing routines use a function called MicroTimer, provided by Microsoft for accurate timing.

It uses Windows API calls to the computer's high-resolution timer and is faster and more accurate than the VBA Time function. It can measure intervals down to microseconds.

The results of these tests will be different each time you run them. Things like the number of missing numbers, whether you are using Debug.Print or not, and other processes running on your computer will mean each test will be slightly quicker or slower than the last one.

But to give you an idea of the variation I’m talking about, in four runs of the Collection routine with 10,000 numbers the code executed in 1.73s, 1.78s, 1.75s and 1.70s. So unless you are The Flash, you won’t even notice this.

The testing code was run in Excel 2013 and 2010, and each routine was run four times. Firstly with a sequence of 10 numbers, then 100, 1 000, 10 000 and finally 100 000 numbers in the sequence. The execution times for the 4 runs was averaged.

Actually that’s a bit of a lie. By the time I got to 100,000 numbers in the sequence the .Find method was taking so long I just ran it twice. But I did test the others four times each.

The Testing Routines

The testing code uses the RANDBETWEEN function to randomly create numbers between 1 and the largest number in the sequence (10, 100, 1 000, 10 000 or 100 000) in a column starting from A1.

Because RANDBETWEEN is volatile, if we wrote the missing numbers to the worksheet, RANDBETWEEN would recalculate each time we did this, changing the numbers in our sequence and this would mess up our test.

So the missing numbers, and any other output we want, is written to the Immediate window in the VB editor using Debug.Print. To show the Immediate window in the VB Editor either press CTRL+G or go to the View menu ->Immediate Window

Results

With only 10, 100 or 1000 numbers in the sequence, the difference between .Find and the collection/loops routine isn’t perceptible, all the results coming back in under a second.

But with 10,000 numbers in the sequence, .Find takes about 51 seconds, the collection/loop code finds all the missing numbers in less than 2 seconds!

With 100,000 the difference is even more staggering with .Find taking more than an hour and 20 minutes to find all the missing numbers in the sequence. The collection/loop code did it in under 9 seconds.

I was amazed.

I don’t know what is going on in the .Find method but beyond 1000 numbers in a sequence, it seems to be incredibly slow at this particular job. .Find is very good if you want to do a one-off location of a single thing within a range, but for this particular application, looping is by far the better approach.

Below is a table of results (measured in seconds) for each of the tests starting with 10 numbers in a sequence and working through to 100 000 numbers.

 .FindCollection/Loop
 2010201320102013
10 0.0025 0.0054 0.0026 0.0020
100 0.029 0.039 0.013 0.016
1 000 0.664 0.769 0.237 0.173
10 000 50.87 50.38 1.74 1.76
100 000 5037.68 5072.39 8.83 8.15

The Finished Code

After the testing I had to modify the code so that it gave us results we could use in the workbook, not just printing to the Immediate Window. All the code can be found in this workbook.

When you open the VB Editor (ALT+F11) you’ll see 2 modules named Testing_Code (guess what’s in there!), and FindMissingNumbers.

VBA Code Modules

Testing_Code contains the MicroTimer function, and two subs called Using_Collection_Loops and Using_Find. You can run either of these subs to do your own testing.

The array lngNumValues is set to hold five values which are how many numbers in the sequence you test, i.e. 10, 100, 1000, 10 000 or 100 000. You can change these values to whatever you like.

Specifying which value in the array lngNumValues to use in testing determines how many numbers are tested. For example lngMax = lngNumValues(4) sets lngMax to 10 000, so the code will use a random sequence of 10 000 numbers.

When you run either of the testing subs, you’ll see the missing numbers displayed in the Immediate window, followed by a duration for the code execution.

Debug.Print in Immediate Window

How To Use the FindMissingNumbers Sub In Your Workbook

In the FindMissingNumbers module is the FindMissingNumbers sub.

On the first sheet of the workbook I’ve added a big green button, click this and the macro will run. You can also run the macro in a number of other ways. Read this article to learn how to run a macro using a shape, shortcut keys, an icon on your Ribbon, or the Quick Access Toolbar.

When the macro starts the first thing you need to do is specify the input range. You can either use your mouse to select the range, or type it in

Select Input Range

Then you specify where you want the missing numbers to be listed. You can select a single cell and the results will be output into that column.

If you select more than one cell across a row, the results will be output across that row.

Enter your email address below to download the sample workbook.

By submitting your email address you agree that we can email you our Excel newsletter.

Disclaimer – Please test the code yourself, it may not work in your environment. All code provided as is without any warranty


Option Explicit
'**********************************************************
' Find missing integer numbers in a range
' Author : Philip Treacy, MyOnlineTrainingHub
' With thanks to Bryan upon whose code this is based
' For more info go to https://www.myonlinetraininghub.com/find-missing-numbers-really-fast
'**********************************************************
Sub FindMissingNumbers()
    
    Dim InputRange As Range, OutputRange As Range
    Dim LowerVal As Single, UpperVal As Single, count_i As Single, count_j As Single
    Dim NumRows As Long, NumColumns As Long
    Dim Horizontal As Boolean
    Dim rng As Range
    Dim col As Collection
    Dim varDummy As Variant

    
    'Default is to output the results into a column
    Horizontal = False
        
    On Error GoTo ErrorHandler
    
    'Ask for the range to check
    Set InputRange = Application.InputBox(Prompt:="Select a range to check :", _
        Title:="Find missing values", _
        Default:=Selection.Address, Type:=8)
            
    'Find the lowest and highest values in the range/sequence
    LowerVal = WorksheetFunction.Min(InputRange)
    UpperVal = WorksheetFunction.Max(InputRange)
    
        
    'Ask where the output is to go
    Set OutputRange = Application.InputBox(Prompt:="Select where you want the result to go :", _
        Title:="Select cell for Results", _
        Default:=Selection.Address, Type:=8)
    
    'Check the number of rows and columns in the output range
    NumRows = OutputRange.Rows.Count
    NumColumns = OutputRange.Columns.Count
    
    'If there are more columns selected than rows, output is to go horizontally
    If NumRows < NumColumns Then
    
        Horizontal = True
        'Reset the number of rows to 1 so that output is into first row
        NumRows = 1
        
    Else
    
        'Reset the number of columns to 1 so that output is into first column
        NumColumns = 1
        
    End If
    
    
      Set col = New Collection
      
      ' First add all items to collection (ignore errors when duplicates are added)
      For Each rng In InputRange
      
        On Error Resume Next
        
          col.Add rng.Value2, CStr(rng.Value2)
          
        On Error GoTo 0
        
      Next rng
      
      count_j = 1
      ' Loop through every possible nubmer in range (error returned when number not found)
      On Error Resume Next
        For count_i = LowerVal To UpperVal
        
          varDummy = col(CStr(count_i))
          
          If Not Err.Number = 0 Then
            
            'Output the missing number to the sheet
            If Horizontal Then
            
                OutputRange.Cells(NumRows, count_j).Value = count_i
                count_j = count_j + 1
            
            Else
                
                OutputRange.Cells(count_j, NumColumns).Value = count_i
                count_j = count_j + 1
            
            End If
            
            Err.Clear
            
          End If
          
        Next count_i
        
      On Error Resume Next
          
    Exit Sub
    
ErrorHandler:

    If InputRange Is Nothing Then
    
        MsgBox "ERROR : No input range specified."
        Exit Sub
        
    End If
    
    If OutputRange Is Nothing Then
    
        MsgBox "ERROR : No output cell specified."
        Exit Sub
        
    End If
    
    MsgBox "An error has occurred. The macro will end."
    

End Sub

10 thoughts on “Find Missing Numbers, Really Fast”

  1. Goodevening,

    Thank you so much for your code and clear explanation. I was looking for this solution and this really helped me A LOT!

    At the moment I’m trying to amend your code for my needs, but do not know exactly how to proceed.

    I want to report the missing numbers but only those missing numbers that start with (1st digit) the same digit as the value in Cell C1 (for example C1 = 2)

    The numbers that need to be checked are between 10000 till 99999.

    So if the value in C1 is 2, I would like to report all missing numbers from 20000 till 29999.
    I only need 25 numbers to be reported.

    Can you help me to get this accomplished?

    Thank you for you time in advance, I really appreciate it.

    Rgds,
    MME

    Reply
    • hi MME,
      Here is the code you should try;

      Option Explicit
      '**********************************************************
      ' Find missing integer numbers in a range
      ' Author : Philip Treacy, MyOnlineTrainingHub
      ' With thanks to Bryan upon whose code this is based
      ' For more info go to https://www.myonlinetraininghub.com/find-missing-numbers-really-fast
      '**********************************************************
      Sub FindMissingNumbers()

      Dim InputRange As Range, OutputRange As Range, FirstDigitRange As Range
      Dim LowerVal As Single, UpperVal As Single, count_i As Single, count_j As Single
      Dim NumRows As Long, NumColumns As Long
      Dim Horizontal As Boolean
      Dim Rng As Range, FirstDigit As Long
      Dim col As Collection
      Dim varDummy As Variant

      'Default is to output the results into a column
      Horizontal = False

      On Error GoTo ErrorHandler

      'Ask for the range to check
      Set InputRange = Application.InputBox(Prompt:="Select a range to check :", _
      Title:="Find missing values", _
      Default:=Selection.Address, Type:=8)

      Set FirstDigitRange = Application.InputBox(Prompt:="Select the cell with first digit :", _
      Title:="Select the cell with first digit", _
      Default:=Selection.Address, Type:=8)

      'Ask where the output is to go
      Set OutputRange = Application.InputBox(Prompt:="Select where you want the result to go :", _
      Title:="Select cell for Results", _
      Default:=Selection.Address, Type:=8)

      'Find the lowest and highest values in the range/sequence
      LowerVal = FirstDigitRange.Cells(1) * 10000 + 9999
      UpperVal = WorksheetFunction.Min(99999, WorksheetFunction.Max(InputRange))

      'Check the number of rows and columns in the output range
      NumRows = OutputRange.Rows.Count
      NumColumns = OutputRange.Columns.Count

      'If there are more columns selected than rows, output is to go horizontally
      If NumRows < NumColumns Then Horizontal = True 'Reset the number of rows to 1 so that output is into first row NumRows = 1 Else 'Reset the number of columns to 1 so that output is into first column NumColumns = 1 End If Set col = New Collection On Error Resume Next ' First add all items to collection (ignore errors when duplicates are added) For Each Rng In InputRange If Rng.Value2 > LowerVal And Rng.Value2 <= UpperVal Then col.Add Rng.Value2, CStr(Rng.Value2) Next Rng On Error GoTo 0 count_j = 1 ' Loop through every possible nubmer in range (error returned when number not found) On Error Resume Next For count_i = LowerVal To UpperVal varDummy = col(CStr(count_i)) If Not Err.Number = 0 Then 'Output the missing number to the sheet If Horizontal Then OutputRange.Cells(NumRows, count_j).Value = count_i count_j = count_j + 1 Else OutputRange.Cells(count_j, NumColumns).Value = count_i count_j = count_j + 1 End If Err.Clear End If 'add exit point after first 25 numbers If count_j > 25 Then Exit For
      Next count_i

      On Error Resume Next

      Exit Sub

      ErrorHandler:

      If InputRange Is Nothing Then

      MsgBox "ERROR : No input range specified."
      Exit Sub

      End If

      If FirstDigitRange Is Nothing Then

      MsgBox "ERROR : No input range specified for first digit."
      Exit Sub

      End If

      If OutputRange Is Nothing Then

      MsgBox "ERROR : No output cell specified."
      Exit Sub

      End If

      MsgBox "An error has occurred. The macro will end."

      End Sub

      Reply
  2. On my system, the following code is more than 10 times faster than the code you posted (for 999,999 cells arranged in 3 columns)…

    Sub FindMissingNumbers()
    
      Dim InRng As Range, OutRng As Range
      Dim Data As Variant, Nums As Variant
      Dim R As Long, C As Long, LowerVal As Long, UpperVal As Long
    
      On Error Resume Next
       
      'Ask for the range to check
      Set InRng = Application.InputBox(Prompt:= _
                  "Select a range to check :", _
                  Title:="Find missing values", _
                  Default:=Selection.Address, Type:=8)
      If Err.Number Then
        MsgBox "ERROR : No input range specified."
        Exit Sub
      End If
    
      'Ask where the output is to go
      Set OutRng = Application.InputBox(Prompt:= _
                   "Select starting cell for the results.", _
                   Title:="Select cell for Results", _
                   Default:=Selection.Address, Type:=8)
      If Err.Number Then
        MsgBox "ERROR : No output cell specified."
        Exit Sub
      End If
    
      On Error GoTo 0
    
      'Find the lowest and highest values
      LowerVal = WorksheetFunction.Min(InRng)
      UpperVal = WorksheetFunction.Max(InRng)
    
      ' Dimension an array to hold all numbers
      ReDim Nums(LowerVal To UpperVal)
    
      ' Assign element number to array element
      For R = LowerVal To UpperVal
        Nums(R) = R
      Next
    
      ' Store input data in a 2-D array
      Data = InRng
    
      ' Clear any existing values from number array
      For R = 1 To UBound(Data, 1)
        For C = 1 To UBound(Data, 2)
          Nums(Data(R, C)) = ""
        Next
      Next
    
      ' Store all remaining numbers in an array
      Nums = Split(Application.Trim(Join(Nums)))
    
      ' Assign remaining numbers to output range
      OutRng(1).Resize(UBound(Nums) + 1) = Application.Transpose(Nums)
    
    
    End Sub
    
    Reply
      • I’ll be interested in your results run on your computer (the 999,999 cells process in less than 1/2 second using your code on my computer using XL2010, not the nearly 9 seconds you reported).

        Also, I forgot to mention that my code only outputs down a single column… it does not have the ability to output horizontally.

        Reply
        • Hi Rick,

          I couldn’t get your code to run when I had 3 x columns of 999,999 numbers. Each time the code generated ‘Run time error 13: Type mismatch’ on this line

          OutRng(1).Resize(UBound(Nums)) = Application.Transpose(Nums)

          I did test it on a smaller set of numbers though. It did run but I noticed an error. If you have the sequence 2,5,7 the missing numbers are 3,4 and 6. Your code outputs just 3 and 4. I tried several different sets of numbers and your code always missed out the last missing number.

          Regards

          Phil

          Reply
          • My code will only run against a single column… that is what you posted for your example, so I figured the list would only be in a single column. If three columns, which way would the “wrap” occur… down one column then to the next column and so on OR across one row and then down to the next row and so on?

            As for the 999,999 rows of data… yeah, that will overwhelm the Application.Transpose function call by quite a bit… I think 64000 may be the upper limit on the number of element that can be Transposed.

            As for the missing numbers… stupid mistake on my part. The last line of code…

            OutRng(1).Resize(UBound(Nums)) = Application.Transpose(Nums)

            should have been this…

            OutRng(1).Resize(UBound(Nums) + 1) = Application.Transpose(Nums)

          • Hi Rick,

            Your original post said that your code was for ‘999,999 cells arranged in 3 columns’.

            I’ve corrected the last line of your code above.

            Cheers

            Phil

Leave a Comment

Current ye@r *