Introduction
I was presenting the FILTER() function in my Super Forecasting Course last week and as I was demonstrating it, one of the delegates suggested that I could do what I was doing in a Pivot Table. Yes, it was true, I could have done that. So, what is so good about the FILTER() function in Excel, then? Let's see with a real set of data.
Our World in Data: Covid-19 Database
Everyone knows about Covid-19 but not everyone will know about the fabulous web site and organisation Our World in Data, OWID, based in Oxford University and headed by Dr Max Roser. OWID has the mission of making data and research on the world's largest problems understandable and accessible. . Because of the quality of OWID's work, they are the go to web site for many organisations for Covid-19 as well as many other data sets. If I want to know about Covid-19, I go to OWID, as I have in this case.
I am using the OWID Covid-19 database that I downloaded at 0900 hours, Thailand time on 16th June 2021 and you can access the CSV for the complete dataset at https://covid.ourworldindata.org/data/owid-covid-data.csv
Why not use a Pivot Table?
Why am I writing this blog post, then? Why not use a Pivot Table?
As I was preparing this post, I was working on the OWID Covid-19 database in preparation for the second running of my Super Forecasting course being run next week. The comment of last week's delegate was still bouncing around my mind so I asked myself the question, why not use a Pivot Table? In the end, what I did was to create a series for FILTER() based formulas that I have included below and that I will leave to you to determine whether a Pivot Table would be better!
I almost created a Pivot Table to answer my own question but felt it would be better if you answered it for yourself!
The FILTER() Function
The FILTER() function is a dynamic array function that Microsoft released just a few months ago and you will see how useful it is in the examples I am about to demonstrate. I have other blog posts on the FILTER() function that you can easily find and read! For example: https://duncanwil.blogspot.com/p/filter-with-multiple-tables.html and https://duncanwil.blogspot.com/p/dynamic-array-functions-example.html
The Database
The OWID Covid-19 database, today, is 95,744 rows deep and 60 columns wide: that is 5,744,640 cells to work with. That's not a massive database and one reason I might not want to use a Pivot Table is that there are 60 different variables relating to 229 countries or territories. That's a lot of bits of data to think about.
Secondly, I considered my FILTER() based report rather than using a Pivot Table because I knew that once I had created it, I would not be pivoting my reports and graphs, except if and when I downloaded the updated version of the OWID Covid-19 database.
Here is some of my Output
I created a report using just six columns and the first part of it is shown below:
I will share my formula with you in a second but what it did was to extract data from six columns, not all contiguous and then create two graphs from there ... only one graph is shown here. You might notice that I have linked my FILTER() function to cell BS1 in which I type the name of the country of interest for my report: the United Kingdom in this case.
Here is the report for Thailand:
My Formula
The formula I created for the above report is found in cell BK2 and is this:
=FILTER(FILTER(owid_covid_data15062021[[#All],[location]:[new_deaths]],owid_covid_data15062021[[#All],[location]]=BS1),{1,1,1,1,0,1,1})
You might think that that looks a bit complicated but please note, I used Power Query to get the OWID Covid-19 data for me and that has led to some potentially long and relatively complex column and table names. I could simplify these names, of course and I probably will after I have finished this blog page!
I used a nested FILTER() in a FILTER(). If you enter just this formula, FILTER(owid_covid_data15062021[[#All],[location]:[new_deaths]],owid_covid_data15062021[[#All],[location]]=BS1), you will get a valid data set BUT it will contain one column, in this case, that I do not want to include here.
The wrap around FILTER() function allows me to remove column five from the seven columns I identified in my initial FILTER() function: =FILTER(... ... ... ),{1,1,1,1,0,1,1}). By missing out column five, new_cases_smoothed, from my report, I am essentially working with non contiguous data and that's really good news!
For You To Do
Go and get the data from OWID and copy my formula or create your own and see how you like it!
My Second Report
I created another FILTER based report and here it is:
No comments:
Post a Comment