Abstract

There are quite a few different approaches circulating in the web to count unique (different) entries only. My opinion here is blunt and simple: Use my derivation sbCountUniq of Charles Williams' COUNTU function. But let us have a look at a simple example:

COUNTU_Example

As you can see, we have 10 random integers between 1 and 5 in cells A2 thru A11 - actually, the 1 is not occurring. This means that we have only 4 different (unique) values. This result 4 is shown with five different calculations in cells A13:A17. For information purpose these four different entries and their frequencies are shown in the upper right corner in cells C2:D5. This array has been calculated with an array formula using my user defined function Pfreq.

This simple example is based on numerical inputs only. But of course we also could have encountered alphanumerical values - for example a mixture of strings and numbers. The formulas in A14 and in A15 would not be able to cope with non-numerical inputs, though.

Now let us have a look at the runtimes of these different approaches (I used Charles Williams' FastExcel © here):

COUNTU_Runtime

For less than 1,000 records or numbers it does not really matter which solution you take. But if you have more than some thousand records you should take Charles Williams' COUNTU function.

By the way: COUNTU needed 54 milliseconds to calculate the result 1,000 (=1,000 different numbers of 10,000 random numbers). My UDF Pfreq needed 68 milliseconds to list all 1000 different entries with their number of occurrences (a total of 10,000, of course)!

I think it is pretty evident that for this number of entries, simulations etc. you really should apply a VBA solution (ok, or a PowerQuery approach).

A more flexible function than COUNTU is sbCountUniq which works on ranges over more than one workbook as well as on an array:

Appendix – sbCountUniq Code

Please read my Disclaimer.

Function sbCountUniq(ParamArray v() As Variant) As Long
'Count unique values over all input areas (ranges or one array).
'Inspired by Charles Williams' COUNTU function, see:
'http://msdn.microsoft.com/en-us/library/aa730921.aspx#Office2007excelPerf_FindingPrioritizingCalculationBottlenecks
'Reverse("moc.LiborPlus.www") V0.10 10-Jan-2011

Dim colUniques As New Collection
Dim vCell As Variant
Dim vLcell As Variant
Dim j As Long

On Error Resume Next
With Application.WorksheetFunction
For j = LBound(v) To UBound(v)
   For Each vCell In v(j)
        If vCell <> vLcell Then
            If Len(CStr(vCell)) > 0 Then
                 colUniques.Add vCell, CStr(vCell)
            End If
        End If
        vLcell = vCell
   Next vCell
Next j
sbCountUniq = colUniques.Count
End With
End Function

Unique Items with Criteria

But what if you have some criteria and you do not want to re-arrange your spreadsheet so that it shows only relevant data?

Pivot Table

One option might be a Pivot table with a helper column. Personally I would prefer two subsequent Pivot tables.

Advanced Filter

Another option is using the advanced filter.

Worksheet Formulas

The two options I would like to present here are a worksheet formula approach and a VBA approach, though. This is because I think it is worth to learn from the pitfalls of the suboptimal worksheet formula.

Let us assume you have a company in 10 regions with 200 employees who are selling some goods each day. And now you want to know how many (unique) employees per region sold anything in the first quarter of 2010.

We have the sales data in sheet Data:

20101030_PB_01_Count_Unique_with_Criteria_Screen

Now we define some convenient named ranges:

20101030_PB_01_Count_Unique_with_Criteria_Names

In sheet ResultWS we can provide now T. Valko’s (Biff) clever worksheet array formula solution:

20101030_PB_02_Count_Unique_with_Criteria_Screen

The named ranges help us to phrase this formula in a neat way. We need to copy this array formula for each appearing region, and the named ranges make these formulae volatile, but that does not matter too much, right?

Hmm, let us see …

VBA Approach

The VBA approach is pretty straight forward:

20101030_PB_03_Count_Unique_with_Criteria_Screen

Appendix UniqEmployeesPerRegion Code

Please read my Disclaimer.

Option Explicit

Enum Data_Columns
    dDate = 1
    dEmployee
    dCSO
    dRegion
End Enum 'Data_Columns

Sub UniqEmployeesPerRegion()
'Reverse("moc.liborplus.www") V0.10 23-Oct-2010
Dim objEmployeesRegions As Object, objRegions As Object
Dim v As Variant
Dim lRow As Long
Dim s() As String

Const sC = "|"
Sheets("ResultVBA").Range("A2:B10001").ClearContents
Set objEmployeesRegions = CreateObject("Scripting.Dictionary")
Set objRegions = CreateObject("Scripting.Dictionary")
Sheets("Data").Select
lRow = 2
Do While Not IsEmpty(Cells(lRow, 1))
    If Cells(lRow, dDate) >= Range("Start") And _
        Cells(lRow, dDate) <= Range("End") And _
        Cells(lRow, dCSO) > 0 Then
        objEmployeesRegions.Item(Cells(lRow, dEmployee) & sC & Cells(lRow, _
            dRegion)) = 1
    End If
    lRow = lRow + 1
Loop
For Each v In objEmployeesRegions.keys
    s = Split(v, sC)
    objRegions.Item(s(1)) = objRegions.Item(s(1)) + 1
Next v
Sheets("ResultVBA").Select
Range("A2").Resize(objRegions.Count).Value = _
    Application.WorksheetFunction.Transpose(objRegions.keys)
Range("B2").Resize(objRegions.Count).Value = _
    Application.WorksheetFunction.Transpose(objRegions.items)
If CDbl(Application.Version) >= 14# Then 'Try, might not work
    'in older versions, could not test
    Range(Range("A1"), Range("B1").End(xlDown)).Select
    ActiveWorkbook.Worksheets("ResultVBA").Sort.SortFields.Clear
    ActiveWorkbook.Worksheets("ResultVBA").Sort.SortFields.Add Key:=Range( _
        Range("A2"), Range("A1").End(xlDown)), SortOn:=xlSortOnValues, _
        Order:=xlAscending, DataOption:=xlSortNormal
    With ActiveWorkbook.Worksheets("ResultVBA").Sort
        .SetRange Range(Range("A1"), Range("B1").End(xlDown))
        .Header = xlYes
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
End If
Set objEmployeesRegions = Nothing
Set objRegions = Nothing
End Sub

For both the worksheet function approach and the VBA approach I ran simulations on my dual core laptop for 10, 50 and 100 regions and for 1000, 5000, 10000 and 20000 rows of data and I measured the runtime with FastExcel.

20101030_PB_04_Count_Unique_with_Criteria_Screen

Now you can see that the worksheet formula approach shows a quadratic runtime depending on the number of data rows. If you double the number of data rows the runtime will increase by a factor of 4! The VBA approach runtime is linear - doubling the data rows only doubles the runtime.

Please note that a Pivot table approach would presumably be even faster than the VBA approach. But with runtimes around 1s for up to 20,000 rows of data I would be able to live pretty well.