- Technology and analysis
- I pay

When working with data, you often need to know how many unique items are in a list. For example, you might want to know how many customers you have in your database, how many products you can offer to distributors, or all the countries or geographies where you sell.
The required data is often stored in frequently repeated tables or lists. So how do you "retire" replicants? Here I examine three approaches, examining the pros and cons of each.
First, let's consider my data, very suitable for the purposes of this article:
A cursory glance can help you get the information you need - upon inspection, there are six unique elements in the screenshot, not counting the header. But how to make Excel confirm this sum? I present three alternatives, all detailed in the download fileclick on excel.
Option 1: Use dynamic arrays
I can implement a dynamic array type ONLY. This would be an obvious starting point, given the title of this article. I assume the scope (outside the header) is calledExample 1.
Dynamic array types are calculations that use a function that automatically expands its range based on the number of results. This automatic extension is known aspouring outand while it potentially creates an array (a result range that can contain more than one row and more than one column), you don't need to enter it with CTRL+SHIFT+ENTER as array types are a thing of the past. past.
The irony of UNIQUE is that it does two things: it enumerates distinct elements (that is, provides all values that occur without repetition), and it can also return values that occur once and only once in the given range. This is the old function we need here.
The SINGLE function has the following syntax:
=UNIQUE(array, [per_column], [occurs_once])
It has three arguments:
•education:This is required and represents a range or array to return unique values from.
•per_column:This argument is optional. This is a binary value (TRUE/FALSE) indicating how to compare. If you want to compare by row, the argument must be FALSE or omitted (as this is the default). For column-by-column comparison, you must select TRUE.
•happens once:This argument is also optional. It also requires a boolean:
•TRUE:Returns only unique values that appear only once.
•LIE:Include all distinct values (default if omitted).
This may sound complicated, but it really isn't. Create a unique list ofExample 1it is simple:
Type,
=UNIQUE(Example 1)
is simple and intuitive to use, listing six [6] unique elements in the order they are found (they are not automatically sorted - use the SORT dynamic array function to achieve this). Once we have that, all we have to do is count the elements in the list. The COUNTA function will accomplish this by counting the number of non-blank cells in a range (see the "Using UNIQUE to count unique elements in a list" screenshot).
Using the UNIQUE function to count the unique elements in a list
Note that the SINGLE formula here is in cell F32 and spans the range F32:F37. Highlighting this range will cause Excel to display it as F32#, and the pound/hash sign (#) means that the range may vary. This is the reason that,
=ACCOUNT(F32#)
counts the spilled area coming out of cell F32, thus adding six [6] unique elements. Note that spaces will appear as "0" in the range, just like zeros, but will be treated as twodifferenceunique items, which are quite useful.
Types can be condensed (or "nested") this time:
=NUM(UNIQUE(Example 1))
Nested array types do not always produce the required results due to the way Excel's calculation engine works (this is captured by a well-recognized scientific method known as "trial and error"), but in this case it does.
This method is extremely simple and should be understandable to most Excel users. But that's not all: Pivot tables are currently only available in Excel 365 and Excel 2021, so they're not available to everyone. Call me old fashioned, but many people cringe when they see #NAME? Instead. Therefore, this solution is only useful when all end users have dynamic array types available to them.
So what alternatives can we consider?
Option 2: use pivot tables
Everyone loves a good dynamic table, right? Since table creation is firmly built into your spreadsheet software, you don't have to worry about version compatibility when using cross-references.
In this case, I will first convert the source data to an array (usingImport -> Tablefrom the ribbon or the keyboard shortcut CTRL+T). This allows you to expand the range automatically, without using those fancy dynamic arrays (see "Convert Source Data to Array" screenshot).
Convert source data to table
On the ribbon, on the contextual tabtable design(ie when you select one or more table cells) you'll notice that I named this table Example2.
I then select one or more cells in that table and selectImport -> Pivot Tablefrom the ribbon (the keyboard shortcut varies depending on the version of Excel you have, but usually starts with ALT+N+V).
I'm browsing nowPivot Table Fieldsglass. If it doesn't appear automatically, right-click on the resulting pivot table and select the final option,display list of fields(It's annoying that Excel refers to it as something else in the popup context menu.) Then just move our single box (List) I am doingScandal(see "Pivot Table Fields Table" screenshot).
This will create the following pivot table:
This list will be sorted alphabetically by default. Now we'll just count the number of non-empty elements in this list. If there was an empty item in the original list, don't worry, it will still appear in the pivot table as (empty), so they will be treated as, um, non-empty (see "Using PivotTables to Count Unique Elements in a List" screenshot).
Use pivot tables to count unique items in a list
This approach has several disadvantages:
- Note that the COUNT. You may need to include a larger range than the pivot table fills. This happens in case the range has been extended after the data has been updated. This can cause problems if end users add other data to this spreadsheet.
- If the source data changes, update the PivotTable type: COUNT. it will not necessarily give the correct answer until this action is taken. Many users forget this.
So the idea is simple. However, while it will work in all current versions of Excel, end users may forget to refresh the data if the source list is updated. So what alternative do we have?
Option 3: Use SUM PRODUCTS
Regular readers know that SUM PRODUCT is one of my favorite functions in Excel, so much so that I named our company after it.
The feature is very underrated. Consider the example in the "Example using the SUMPRODUCT function" screenshot.
An example of using the SUMPRODUCT function
Here I have different prices and the corresponding quantities sold. To calculate my total sales, I can calculate my sales by takingproductzunit priceis multiplied byHow muchlevel line by line and thenadding upof. As you can see, SUMPRODUCT does it all at once:
=SUM PRODUCTS(B3:B11,C3:C11)
But SUMPRODUCT is more powerful than that. Type
=SUMAPRODUCTS(B3:B11*C3:C11)
does exactly the same. However, think about it
=SUMPRODUCTO(B3:B11/C3:C11)
Take a look at this revised example in the "Modified SUMPRODUCT Example" screenshot.
Improved SUMPRODUCT example
Can you see how SUMPRODUCT is breaking record after record? This is powerful and this is the idea that I will use to implement our final method using our list, now called easilyExample 3(see "Using SUMPRODUCT to count unique items in a list" screenshot).
Use SUMPRODUCT to count unique items in a list
I used the pattern here
=SUMPRODUCT((Example3<>””)/COUNTIF(Example3,Example3&””))
This seems to work, although at first glance its logic may be a little less clear than the other two approaches. Original state:
(Example 3<>””)
check if the rangeExample 3contains cells that are not empty (TRUE if yes, FALSE if not). Here it doesn't have to be non-empty, it just has to beallwhich cannot be listed in this scenario. You can replace it with a[1] if you want, but I wanted to show how it could work if you wanted to exclude empty cells. This gives us:
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE
The second part, COUNTIF(Example3;Example3&””)
uses one of the more unusual ways to use COUNTIF. Returns an array again, but this time each value in the array represents the number of numbers in the array using each value in the array as criteria (adding &'' just converts the value to a text string, which may be needed in some cases) . Tasks:
3, 2, 1, 3, 2, 1, 2, 2, 3, 1
That is, there are three [3] occurrences of 'a', two [2] occurrences of 'b' and so on. The numerator is then divided by the denominator element by element, giving us:
0,33, 0,5, 1, 0,33, 0,5, 1, 0,5, 0,5, 0,33, 1
In math operations, TRUE behaves like a one [1] and FALSE like a zero [0]. These results are then added to give us six [6]. Easy when you know how.
The advantage of this approach is that it does not require dynamic tables or data refresh. The problem is that the calculations are a bit opaque. Neither solution is perfect, but this last option may be the most flexible.
a word to the wise
Some of you may be surprised that I didn't use Power Query/Get & Transform as one of the options above, since deduplication is a key transform in the Power Query Editor. For any purpose of this article, I just wanted to go over the basic features and functions of Excel.
Actually, you can also use Get & Transform. This is a great method for cleaning up data where there may be excessive spaces (trimming), non-printable characters (cleaning), or an excess of random upper/lowercase letters. However, like the pivot table solution described above, this requires refreshing data to refresh, which many end users forget. Therefore, given the "behind the scenes" nature of this option, I've decided to disable it in this case. However, this Extract/Transform/Load (ETL) tool should be considered an essential tool in any modeler's arsenal. It's just a matter of knowing which one might work best at which time.
Liam Bastick, FCMA, CGMA, FCA, is the director of SumProduct, a global consultancy specializing in Excel training. He is also an Excel MVP (designated by Microsoft) and an authorIntroduction to financial modelingIThe economic model continued.Submit ideas for future Excel-related articles toliam.bastick@sumproduct.com. To comment on this article or suggest an idea for another article, please contact Oliver Rowe atOliver.Rowe@aicpa-cima.com.