Introduction
In the UK over the last couple of weeks, there has been a slur campaign against family doctors. A newspaper published an article that is supposed to show how lazy our doctors are. What they did was present a biased picture of how many consultations these doctors provide.
I downloaded the GP Appointments file and carried out my own analysis. I was a bit surprised to find that it all came to 4.7 million rows of data for a year and a half's data so initially I used Power BI for my analysis. I ran into a DAX problem that is now solved. I ran into another problem to do with April 202's data that I cannot fathom.
For that reason, I came back to Excel to do my work and the Date Table is now in the M Language, not DAX. DAX wasn't the problem, it was my use of it that was wrong and I would use it again but chose M in Excel.
I was really using Excel to see if I could replicate the error that Power BI was giving me and my good news is that I could not: that error has disappeared. My conclusion is
a I did something to the data to corrupt it
b Power BI just does not like April 2020's CSV file for a reason I cannot see as yet
I will keep looking at it to see what I can find.
Enough of that: why am I here today, with this?
The Story
A GP in the UK is a family doctor, the first medical port of call for treatment and/or referral to a hospital. There are many thousands of these doctors all over the UK and they provide a sterling service, as they work with nurses and other medical practitioners in their surgeries to provide a wide range of medical care.
The newspaper article I referred to is reported as saying that the number of Friday consultations is significantly less than the number of Monday consultations so Doctors are slackers as they stop providing services on Friday. The truth is two fold
a all doctors work the same on Monday, Tuesday, Wednesday and Thursday
b the demand for consultations in less on Friday ... that is, the public don't go to see their Doctors on Friday as much as to day on the other weekdays.
The newspaper was attacking the reputation of our front line medical services.
My Excel Work
4.7 Million Rows
Knowing that the total file would take up 4.7 million rows, how can we get all of that data into Excel? Firstly, the data came to me in CSV files and Power Pivot will not work directly with them. I couldn't just copy and paste all of the files 22 files and try to combine them because Excel is limited to 1,048,576 rows and the data comprises 4,724,442 rows.
Here is how I got the data into Power Pivot.
Create a Query in Power Query: from file, from folder ... CSV files only ... choose all 22 files to be combined
Transform the data to make sure you get exactly and only the files into the Query that you want: after all I had additional CSV files, an XLSX file and a PBIX file in that folder as well and I needed to exclude them
Having set up the Query I then did this:
Close and Load to ... Only Create Connection ... Add to the Data Model
That does it: all of the 4.7 million rows stay in memory and the Query become a model in Power Pivot.
Date Table
I had a problem with my Date Table, DAX based, as I mentioned above but with help from the Power BI community, I sorted that out.
In my Excel file I created a Date Table using the M Language, as described on this page.
To make the Date Table available to me for my analysis in a Pivot Table, I needed to set up a relationship between the Query and the Date Table. My Date Table looks like this:
I used several of the columns in that Table to help my analysis, as you will see.
No comments:
Post a Comment