Use FILTER() Instead of a Pivot Table

Introduction

I was presenting the FILTER() function in my Super Forecasting Course last week and as I was demonstrating it, one of the delegates suggested that I could do what I was doing in a Pivot Table. Yes, it was true, I could have done that. So, what is so good about the FILTER() function in Excel, then? Let's see with a real set of data.

Our World in Data: Covid-19 Database

Everyone knows about Covid-19 but not everyone will know about the fabulous web site and organisation Our World in Data, OWID, based in Oxford University and headed by Dr Max Roser. OWID has the mission of making data and research on the world's largest problems understandable and accessible. . Because of the quality of OWID's work, they are the go to web site for many organisations for Covid-19 as well as many other data sets. If I want to know about Covid-19, I go to OWID, as I have in this case.

I am using the OWID Covid-19 database that I downloaded at 0900 hours, Thailand time on 16th June 2021 and you can access the CSV for the complete dataset at https://covid.ourworldindata.org/data/owid-covid-data.csv

Why not use a Pivot Table?

Why am I writing this blog post, then? Why not use a Pivot Table?

As I was preparing this post, I was working on the OWID Covid-19 database in preparation for the second running of my Super Forecasting course being run next week. The comment of last week's delegate was still bouncing around my mind so I asked myself the question, why not use a Pivot Table? In the end, what I did was to create a series for FILTER() based formulas that I have included below and that I will leave to you to determine whether a Pivot Table would be better!

I almost created a Pivot Table to answer my own question but felt it would be better if you answered it for yourself!

The FILTER() Function

The FILTER() function is a dynamic array function that Microsoft released just a few months ago and you will see how useful it is in the examples I am about to demonstrate. I have other blog posts on the FILTER() function that you can easily find and read! For example: https://duncanwil.blogspot.com/p/filter-with-multiple-tables.html and https://duncanwil.blogspot.com/p/dynamic-array-functions-example.html

The Database

The OWID Covid-19 database, today, is 95,744 rows deep and 60 columns wide: that is 5,744,640 cells to work with. That's not a massive database and one reason I might not want to use a Pivot Table is that there are 60 different variables relating to 229 countries or territories. That's a lot of bits of data to think about.

Secondly, I considered my FILTER() based report rather than using a Pivot Table because I knew that once I had created it, I would not be pivoting my reports and graphs, except if and when I downloaded the updated version of the OWID Covid-19 database.

Here is some of my Output

I created a report using just six columns and the first part of it is shown below:


I will share my formula with you in a second but what it did was to extract data from six columns, not all contiguous and then create two graphs from there ... only one graph is shown here. You might notice that I have linked my FILTER() function to cell BS1 in which I type the name of the country of interest for my report: the United Kingdom in this case.

Here is the report for Thailand:


My Formula

The formula I created for the above report is found in cell BK2 and is this:

=FILTER(FILTER(owid_covid_data15062021[[#All],[location]:[new_deaths]],owid_covid_data15062021[[#All],[location]]=BS1),{1,1,1,1,0,1,1})

You might think that that looks a bit complicated but please note, I used Power Query to get the OWID Covid-19 data for me and that has led to some potentially long and relatively complex column and table names. I could simplify these names, of course and I probably will after I have finished this blog page!

I used a nested FILTER() in a FILTER(). If you enter just this formula, FILTER(owid_covid_data15062021[[#All],[location]:[new_deaths]],owid_covid_data15062021[[#All],[location]]=BS1), you will get a valid data set BUT it will contain one column, in this case, that I do not want to include here. 

The wrap around FILTER() function allows me to remove column five from the seven columns I identified in my initial FILTER() function: =FILTER(... ... ... ),{1,1,1,1,0,1,1}). By missing out column five, new_cases_smoothed, from my report, I am essentially working with non contiguous data and that's really good news!

For You To Do

Go and get the data from OWID and copy my formula or create your own and see how you like it!

My Second Report

I created another FILTER based report and here it is:


It is this report that I decided, once I'd prepared it, I would not change. Except that I have made the column headers, country names, dynamic so that you can call up the data from any country in that table. In this case, I created a new formula for each row but from then on, I just had to fill them right to complete the table  I put nine countries in that table but you can choose one, two, five, twenty ... as you wish.

Here are the formulas I have created for the first column, United Kingdom and notice my row headers are the column headers from the Covid-19 database and I did not change them:



Just fill right to columns two to N to complete your own table. Notice I have used cell ranges rather than structured references and, if I left them like that, I would need to update them manually every time this database was updated if I wanted to make sure they reflected all of the data in the database.

Sparklines


I created a sort of dashboard using Sparklines that I show below and it is a work in progress that I may well abandon in favour of something more formal. I do use Sparklines as the basis of a dashboard and they can look really effective but I am not at that stage with the following!



My Other Graphs


I already showed you one of my more formal graphs but here it is again with a second graph, based on data generated by my FILTER() formulas:


and 


Conclusion: is this Better than a Pivot Table?


Yes, it is better than a pivot table but then again, no it isn't!

What is better is that I have created specific reports that I will change in only very minor ways: just change the country/countries and nothing else. I think these outputs are perfect for that.
What is worse is that I have fixed my report hence, literally, I have lost the major flexibility that a pivot table can give me. I cannot pivot my data and draw comparisons and conclusions as I can with a pivot table.

All of the above is done deliberately and one of the main reasons for sharing this page is to help you to appreciate, learn and use the FILTER() function.


Excel File: follow the link to OWID and get the Covid-19 database for yourself ... it is not mine to share! Hence, there is no file to download from this page.



Duncan Williamson
16th June 2021


No comments: