Dealing with Budget vs Actuals Closed Dates

This post is an extremely practical one targeted at business analysts. Most of you have a scenario that includes a general ledger and monthly budgets that have corresponding actuals. Often there are actuals in the system that you don’t want to include because they haven’t “closed” yet.

If you are somewhat new to Power BI, you are probably handling this by manually setting filters in your Power BI report and having to adjust them each month. When i see report authors try to automate this i typically see a single measure including a FILTER expression to account for ONLY closed actuals. The problem with this approach is that you end up writing this FILTER expression many times over if there are many measures that need to observe closed vs unclosed actuals. And this approach also doesn’t allow for adjusting axis on bar charts displaying your budget and actuals values.

In this blog post I will show an alternative option that is pretty easy and allows it to be applied to any measure that you may include in your model. For the PBIX file referenced below, you can find it on my GitHub

In the report below, notice that my main “% Variance to Budget” measure on the left side is including unwanted actuals from 2016, a few months in 2017 and 2018. I am using the same “% Variance to Budget” measure on the right side but is only including months that have a budget defined (2017 only) as well as ONLY months that actuals have closed (January through October excluding November and December).

The two visuals on the right are utilizing a calculated date table based on our “closed actual dates” instead of the full date dimension. Lets review the simple data model being utilized in this example.

I have a budget table that includes the status if that month has been “closed” or not.

Date Budget Status
01/01/2017 300 Closed
02/01/2017 500 Closed
03/01/2017 500 Closed
04/01/2017 400 Closed
05/01/2017 400 Closed
06/01/2017 500 Closed
07/01/2017 500 Closed
08/01/2017 400 Closed
09/01/2017 400 Closed
10/01/2017 400 Closed
11/01/2017 400 Open
12/01/2017 500 Open

 

Notice that it does not include any dates from 2016 or 2018 but also shows the status of November and December as “Open”.

Now in my actuals table, you can see that the dates include 2016, November and December of 2017, and 2018.

Date Actuals
01/01/2016 800
03/01/2016 100
09/12/2016 4000
11/22/2016 250
01/04/2017 100
01/28/2017 300
02/01/2017 500
03/12/2017 200
04/05/2017 400
04/22/2017 100
05/02/2017 300
05/28/2017 100
06/29/2017 500
07/01/2017 100
07/04/2017 800
07/23/2017 200
08/09/2017 400
09/21/2017 100
09/23/2017 500
10/11/2017 300
10/20/2017 100
10/31/2017 250
11/12/2017 100
11/21/2017 200
12/01/2017 500
12/22/2017 2000
12/30/2017 100
01/02/2018 200
02/02/2018 1000

I have pulled both of these tables into my data model and have also included a Date dimension table that i typically use. However,  have also created a “calculated table” from the “New Table” icon on the modeling tab that is going to reflect our “closed actuals dates”.

The formula for this calculated table is below:

 

The CALENDAR() DAX function expects two dates that it will use as the beginning of the date range and the end of the date range that will be produced. All dates between these two dates will be produced as a table.

The “Start Date” is pretty easy as we will use the minimum date from the budget table “Budget[Date]”. The “End Date” however is a bit more complex. A Calculate statement is used so that we can apply a FILTER to the table to ensure the Budget[Status] is “Closed”. This is the expression is the second parameter to the CALCULATE function. The first parameter is doing two things:

  • First it is getting the MAX Budget Date which in our budget table after the filter for status is applied would be 10/1/2017. The problem is that our budget is monthly while our actuals are daily. The monthly budget dates are being stored as the first date of the month.
  • So, the second function being applied is EOMONTH which stands for “End of Month”. This will give the last date in the month based on a date value in its first parameter and an offset (how many months from the date that is given) in which our case we want to use the same month so the offset is zero.

 

Now we have a calculated table that includes ONLY dates we want to use when comparing actuals vs budget. In the data model, lets connect this to the Date Dimension table as that is the table that would likely hold the relationship to the Actuals and Budget table.

You can see from my model i did add two additional calculated fields to the ClosedDates table for Month and Year. Those formulas are below:

 

To test this, i used a percentage difference formula that i created from a “Quick Measure” by clicking the “…” next to a table you want to create the measure in.

I named mine “% variance to budget” which automatically produced the following DAX expression:

 

Now that the measure is complete, add a card visual displaying it.

Initially you will get a number that contains all actuals being used as the comparison value to the budget, but by simply dragging into the FILTERS pane your ClosedDates[Date] field as a filter in the visual or even at the page level (if everything on this page is comparing actuals against budget) equal to “is not blank” it will ONLY show actuals for dates that are closed and a budget is defined.

Now your measure is only showing variance based on the dates that exist in the Closed Dates calculated table

Because we created a calculated table, it can also be utilized in a bar chart showing our “actuals” and “budget” fields without having to wrap them in measures or manually filter out the time frames that don’t have budget associated or closed actuals.

 

Conclusion

Not the flashiest blog post but I hope can help solve a real practical problem for many business analysts trying to compare budgets vs actuals or even forecasts.

Working with Scatter Plots in Power BI

I really like some of the advancements that have been made in Power BI scatter plots over the last few months. I wanted to point out some capabilities you may not be using that maybe you should be.

Data Sampling Improvements

In the September 2017 release, you can now be confident that all of your outliers are being shown. No one can visually look at a plot and interpret several thousand data points at once, but you can interpret which of those points may be outliers. I decided to test this out myself between a Python scatter plot of 50k data points and Power BI.

In the test, I used a randomly generated normal distribution of 50k data points to ensure I had some outliers.

(You can see the Python notebook on my GitHub here).

Here it is in Python:

Here it is in Power BI (September desktop release)

Notice that all the outliers have been preserved. Note that in previous releases, the Power BI rendering of this would have been shown as below.

This is a great improvement. To learn more about this update, check out the official blog post on high density sampling: https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-high-density-scatter-charts/

Working with Outliers (Grouping)

Now that we know the dense sampling is preserving our outliers, we can perform some analysis on them. Power BI makes it easy to CTRL+click on multiple outliers and then right-click and add new Group

This will create a new field in your fields list for this group of outliers and will automatically include a Group for “Other” (the other 49.993 data points that weren’t selected). Note that I renamed my field to “High Performers”

As this is a random dataset with integers for x,y values there are no dimensions here that may be interesting to compare, but consider now we can always come back to this grouping for further analysis such as the bar chart below:

Clustering

You can also use “…” in the upper right of the scatter chart to automatically detect clusters. Our example again is a bit uninteresting due to it being a random normal distribution but gives you an idea of how you can cluster data that is more meaningful.

Symmetry Shading and Ratio Lines

These gems were released in the August 2017 desktop release and really helps visualize the skew of your data.

Both of these can be turned on from the analytics tab.

Instead of using our sample dataset above I will use the dataset from my last blog post on Scorecards and Heatmaps.

In the below plot I took the SalesAmount field and plotted on the y axis against the SalesAmountQuota field on the x axis. From Symmetry shading we can observe that none of our sales people are meeting their quota. From the ratio line we can see the few individuals that have a positive variance to the ratio while most are flat or below the ratio.

You can read more about these two features in the August Desktop blog post: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-august-2017-feature-summary/#symmetryShading

Conclusion

These are just a few of the recently released features that I think have made the native scatter chart in Power BI a very useful visual. I have posted the PBIX file for the normal distribution data on my GitHub if you would like to download: https://github.com/realAngryAnalytics/angryanalyticsblog/tree/master/20171002-scatterplots

 

 

Performance Scorecards and Heatmaps

The new Matrix visual in Power BI is impressive and with a few tricks you can make great interactive scorecards and heat maps. At first glance, you may assume this to be easy, but you will see in the below content you have to make some deliberate decisions with formatting and also create several measures to get the job done.

I have the finalized PBIX file on my GitHub repository so use that for reference as we walk through building these visuals.

I am using the AdventureWorksDW dataset for this. I have imported the FactResellerSales table and selected the related tables. The main tables used will be FactResellerSales, DimEmployee, DimSalesTerritory, FactSalesQuota

I have created 5 base measures to be used for our scorecard. Note that I have manufactured the Customer Complaints value by using RAND. The required measures are below:

Also, one more change to make things more readable. In this data set in DimEmployee do a rename on the EmergencyContactName to Full Name. I noticed that this was the concatenated value of First Name and Last Name… or you could create a calculated column if you wish.

To create a scorecard as I have shown above, we need to use the “Selected Measure” trick where we will create a Measure table and use a DAX expression to populate the appropriate measure value depending on which value is selected in the table.

You can use the “Enter Data” button in the ribbon to simply enter the 5 values (and optionally an order) as shown below

Now with the Metrics table and the measures you have added above, you can create the below measure to switch between them based on a slicer or filter that has been applied

This snippet shows a SWITCH statement used against the FIRSTNONBLANK value only if the Metrics table ISFILTERED

You can try this out by creating a simple bar chart visual using the Selected Measure as the value and the DimEmployee[Full Name] (that we created earlier) as the X Axis. Also add the Metrics[Value] as a slicer

This simple trick allows us to get a row for each measure on our scorecard later.

Lots of Measures

The next part is tedious. Each of our 5 measures above actually requires 5 additional measures to get the attributes found in the scorecard. This means 30 total measures to show 5 rows in our scorecard.

The opportunity here however is if you are not a DAX expert, this may be a good primer for you to understand how DAX really makes almost anything possible in PowerBI. I will show the formulas for “Sales Amount” and a few nuances for other metrics after that. For all the details, refer to my GitHub repository with the PBIX file.

For our scorecard, we want to be able to show a percentile and a quartile value. This makes it easy at a glance to see how an individual (or sales region) is doing in respect to their peers. To get either of these values, we must first calculate the RANK of the sales person.

We will use RANKX to calculate the RANK of the sales person for [Sum of Sales Amount]

Note the use of the ALL function over DimEmployees. Because FILTER Context is applied in a table or matrix visual, when a row is shown for employee “Amy Alberts”, because of relationships in the data model, the only rows from the FactResellerSales table that are being evaluated for a measure are the ones for Amy Alberts’ employeeKey. We need to use ALL to ensure that filter context is cleared when evaluating the RANKX expression. See the below table as an illustration.

There is one more problem with our DAX however… DimEmployee has over 200 employees and not all of them are in Sales. The above DAX will calculate a rank for ALL employees instead of JUST the sales employees. One method to reduce this is to simply set this column to BLANK() if the [Sum of Sales Amount] measure is blank. The final DAX used is below. (I used a variable for readability but not necessary)

NOTE: simply using sum(FactResellerSales[SalesAmount]) will not work here, this is why we created a measure for it

With a Rank value, now we can calculate the “percentile” the sales person is in. This is effectively “normalizing” the ranks.

To determine the denominator, we are using a FILTER to only retrieve the employees that are in the “Sales” department.

Now with the percentile, we can determine the quartile the sales person is in.

I have used the ISBLANK() function to ensure I do not get any unwanted values showing up in Quartile 4. (Note: I had an Excel savvy client tell me I could use MOD() for this but I couldn’t get it to work, so this may be more verbose than it has to be)

Lastly, lets get the average Sales Amount for the company as a comparison for the score card.

CALCULATE applies the second argument’s FILTERed table to the first argument’s expression. AVERAGEX will get the average value of [Sum of Sales Amount] by the DimEmployee record

Now apply the same logic to the other 4 metrics we want to put on our scorecard. A few things to note are explained below.

When using RANK, some of the metrics such as Discount Percentage (as we don’t want to give big discounts to make sales) and Customer Complaints need to be ranked in ascending order instead of the default of descending order. So in those instances, we need to add an additional parameter to the RANKX function.

We have added “ASC” as the 4th argument to RANKX for “Ascending” order. Notice the missing 3rd argument. This is to tell the rank function if you want DENSE ranking. We want the default which means if 3 values are tied for the 10th ranking, the next rank will show 13 instead of 11.

Also, in the above measure, we have added a FILTER to the ALL(DimEmployee). See the below screenshot to see the RANK without this filter:

When the FILTER expression appears, we are further reducing the table in the first argument. Here, we are checking for [Sum of Customer Complaints], if it ISBLANK(), then applying a NOT to that. So, if [Sum of Customer Complaints] is not blank, we want to include those rows in the ranking. This is necessary to add when doing an ASCENDING rank as employees in the employee table that have no sales records are still getting ranked. This is true also when in DESCENDING order but it doesn’t matter because those rankings are below the sales people and because we have applied the ISBLANK([Sum of Sales Amount]) after the rank has occurred, they are filtered out anyway.

Just as was done with the Selected Measure before, each of these additional measures that may be displayed such as “Quartile” and “Percentile” need to have the ISFILTERED condition applied to the Metrics table:

Repeat this with Quartile and Avg measures.

Formatting the Scorecard

Create a new matrix visual

Now select the Metrics[Value] as Rows and your Selected Measure measures as values.

On the screenshot above you will notice that I added an additional measure Selected Measure Target . This follows the same pattern as shown above with the SWITCH statement and uses the sum of FactSalesQuota[SalesAmountQuota] as well as the 3 static values for Customer Complaints Target, Discount Percentage Target, and Quota Variance Target. I left Total Orders as BLANK(). The final measures are shown below.

Apply a matrix style from the formatting menu in the visuals pane. I chose Bold header.

I also increase the “Text Size” property in the Column Headers, Row Headers, and Values all to 11.

With the July 2017 release of Power BI Desktop, you can now simply right click any of the values in the selected fields and rename them. This allows us to rename our row and column headers.

Now we have this

Right click on the Percentile in the selected fields pane and select conditional formatting -> data bars

In the below dialog, notice that I have changed the Minimum/Maximum to Number values 0 and 1. This is to keep the data bars from being “relative” to what is shown and will show the full span from 0-100%. Also have changed the Positive bar color to a gray so that the bar is not distracting, but rather accents the percentile.

Now right click the Quartile and select conditional formatting -> color scales

See the selections in the above screen shot.

Format blank values – I select “Don’t Format” as when a value is blank, don’t give the impression that it is actually good or bad

Minimum and Maximum – I change these values from Lowest and Highest value to a Number. This is because you want every representation of quartile 1,2,3,4 to be the exact same color. If you leave this as default, sometimes 2 will be completely red and sometimes it will be completely green

Diverging – This gives us a center color that helps make the 2nd quartile look light green and 3rd quartile be orange.

The hex values I use for these colors are 107C10, FFFE00, A80000 respectively

Now the scorecard is complete. Add a slicer for Employee[Full Name] and you can see how each of your sales people are doing

Insight taken from the above screenshot that “Linda Mitchell” is one of our best sales people but it may come at the expense of customer complaints and giving deeper discounts than others.

Creating the Heatmap Visual

Now we can take the skills acquired from the scorecard to create a heatmap.

Simply use a matrix visual again and set the rows to DimEmployee[Full Name] and select each individual measure you want to display as a value. Try to put all the quartiles together so there is a good heatmap effect. Include additional measures that may be valuable to sort by. Remember, the matrix visual has automatic sorting by column so this may help with analysis. I have selected the below values.

With re-applying the Quartile conditional formatting logic from above, the heatmap should looks something like this.

If you have a bunch of additional employees showing, add the DimEmployee[DepartmentName] as a visual or page level filter and that will reduce your rows.

Using a Region Hierarchy

IF you are still here (most of you have already thought TL;DR by now)… we can real quick add a calculated column in the DimEmployee table for “Sales Region” so that we can create a heatmap that roles up to the sales region.

Simply add the DimEmployee[Sales Region] to your Rows above your DimEmployee[Full Name]. Now you have a heatmap that can rollup to the sales region level. See below.

On the scorecard you now can see comparisons of your regions as well.

Conclusion

There are many variations of scorecards and heatmaps. I have found the above use of the new Matrix visual to be very good.

 

 

 

 

Dear Mr. CISO

“Dear Mr. CISO,

 As our Chief Information Security Officer I know you are exhausted trying to keep us safe from security breaches and you are ultimately responsible for any that may occur in our organization. I have started working with Power BI Desktop to get a more modern and effective view of the data that is running our business. My manager told me though that we can’t upload any of our data to the cloud. I asked her why and she just told me because we haven’t went through the necessary steps to ensure our data would be safe and that it is too much of a risk. When I pushed for more information, she said it is ultimately your team’s decision and when you approve it then we can use products like Power BI that have components that live in the cloud.”

data-security_2283310b

In so many organizations that are starting to see the benefits of using a Software as a Service (SaaS) solution like Power BI, this is the elephant in the room. Individual departments start to use the FREE version of Power BI. At some point the organization wants to limit the use until a security review has been completed by the security team. This blog post will discuss 5 main security areas your CISO will want to know about:

  1. Compliance
  2. Authentication
  3. Authorization
  4. Data Security
  5. Auditability

Power BI is a self-service solution. It is intended to enable business users to “do more” with their data. In the data security section of this post I will talk about how data can remain on premises and only be accessed via a private pipe through Power BI, but resist the urge to restrict all data from being uploaded to the cloud. It will greatly reduce your ability to truly enable a data culture in your organization. Create policies that define which data is High Business Impact (HBI) and which data is Low Business Impact (LBI). To start with, you may choose to keep your HBI data on premises only while allowing your LBI data to be published to the cloud. Over time, as your comfort grows with a cloud based solution, those policies can be modified.

Prior to discussing these 5 areas of security, I want to reference two white papers that have been extremely valuable to me. The Power BI Security White Paper is publicly available and describes in detail how Power BI as an application is architected and how data is stored in the cloud. There is also a great governance white paper published by Melissa Coates (@SQLChick) and Javier Guillén (@javiguillen) that will help you understand approaches to deploying Power BI.

Compliance

As of the time of this writing in July 2016, the below compliance are held/adhered to by Power BI. An updated list should be available in the Microsoft Trust Center.

pbicompliances

Most security conversations should start here. 100’s of pages of security requirements are documented in these compliance standards (I will leave it to you to bing or google them)… A security conversation should not start from ground zero. It should start from the coverage of the above compliances and ask “beyond these requirements, what are your concerns?” If Power BI meets these security requirements, why rehash them? Start with the “above and beyond” and reduce a week long security review effort to a couple of hours.

One compliance worth pointing out is ISO 27018. This is the compliance for Personal Identifiable information (PII) in the public cloud. This is often the data that is most scrutinized for its security in organizations that hold it. full name, birth date, SSN… no one wants to see this data leaked in a security breach. Power BI attests to holding this standard to keep this data safe.

Lastly on this topic, Microsoft strives to hold as many of the compliances that organizations frequently request. So check the Trust Center link above every few months for other compliances that may be important to you if they are not yet there.

Authentication

The front door… This is where arguably 80% or more of security breaches happen. Weak username/password security.

Have you heard of phishing attacks? Someone sends you an email that looks like a legitimate request with your company’s logo to reset or verify your credentials by clicking on a link. You go to the bogus page (that has your company’s logo on it) and you enter your username/password and it thanks you for your verification… it happens all the time… and sadly, it works so often…

Multi Factor Authentication (or MFA) to the rescue…
WAMFAAnnimated_512

Power BI can be setup to ensure that any time someone tries to login outside of the organization’s network, that a second (or even third) method of identification verification is had before allowing a user access. Often this is a requirement for the Power BI user to enter their credentials, but after that, required to accept a phone call (or use an app) to verify their identity prior to allowing that user access to the application.

So that little phishing attack above; when the hacker tries to use the credentials that were naively shared with him, the real user would get a call on their cellphone asking them to verify their identity to access the system and warn them if they didn’t just try to access the system to contact their IT administrator. It could even require them to enter a PIN only known to them if the hacker also knew the person and took their phone.

This single capability of multi factor authentication can in most ways be just as powerful of an authentication method as the badge you swipe every day to get into the building you work in.

Authorization

Once a user has validated who they are and has successfully logged in to Power BI, it then becomes a question of what data are they “authorized” to see.

auth

I previously did a blog post on how to distribute content in Power BI. This covered how to give the appropriate people access to the appropriate content. However, if you want to distribute a single report to multiple users but only allow each user to see the data they have access to, this can be achieved through the Row level Security (RLS) feature that has already been well documented.

Again, it is important to remember that Power BI is a self-service solution that allows the content author to have control on how data should be shared. Additional features will be added over time to restrict some capabilities and to provide further governance for enterprise solutions, but don’t get too caught up in what “is not restricted” currently as people generally do not go out of their way to find additional work.

Data Security

So now your data is outside the four walls of your data center… it is in the cloud… is it safe?

This is where i refer to the security white paper for in depth detail on how Power BI secures data. First of all, Power BI as an application follows general 3 tier architecture principles to ensure that the web front end communicates through services to the back end. The “Data Storage and Movement” section of the white paper discusses how nearly all of the data is encrypted at rest in Power BI with the remainder of data that is not is planned to be encrypted in Q3 of 2016.

If you are not quite ready to let your HBI data be uploaded to the cloud, Power BI does offer the capability to keep your data on premises so that Power BI basically becomes a web front end ONLY just as your own data center’s DMZ would be. This is a PRO feature via the use of the Enterprise Gateway and requires building reports in Direct Query mode.

directquery

Another feature of Power BI is the use of Express Route. This will send Power BI traffic over an MPLS line through your network carrier that is completely private. So instead of the data movement that is occurring over https to be going over the public internet, it can go through this private tunnel the same way your data center likely already communicates with its disaster recovery location that is 100s of miles away. Think of it as an extension of your own data center. For more information about Express Route and Power BI, please see the link below:
https://powerbi.microsoft.com/en-us/documentation/powerbi-admin-power-bi-expressroute/

With the data security controls in place and the option to keep specific data sets on premises if necessary, you need to ask the question “what is it that you don’t trust?” These are the same controls you would put in place for your most stringent security requirements for on premises applications.

An area where Microsoft likely goes beyond what is being done in your internal data center is the use of “Red Teaming”. This blog and white paper discusses this technique in detail but it is basically a group of security experts that try to penetrate the Microsoft cloud environment so that the threat detection and response measures can be constantly improved.

Auditability

<<UPDATE as of 8/26/2016, auditing feature is now available, see this video blog on this feature>>

Your CISO will want to know how user activity can be tracked/monitored in the Power BI environment even though it is managed by Microsoft and external to your data center.

It is likely that there are mission critical internal systems used in your organization that do not have the level of auditability that your CISO will want from a software as a service (SaaS) application. That may be understandable as it is the “fear of the unknown” as your organization does not control the environment in which Power BI is hosted.

Activities such as monitoring user logins to the system can be achieved through the O365 logging activities discussed here.

At the time of this writing there is not currently additional activities that can be logged for Power BI usage such as report views and data set usage, however this looks to be coming very soon. See the below link on the ideas site and it has been placed in the “Started” status which means it should be delivered in the coming months.
https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13279371-better-auditing-of-user-activity-datasets-report

It is easy to forget with all the capabilities that Power BI has to offer that it has been generally available for less than a year (July 24th is the anniversary date). So it is somewhat understandable that these deeper administrative functions such as audit logging are just now being delivered. It will be up to your security team on if they want to allow the use of Power BI given all of the security features i have mentioned above or choose to wait until the additional auditing capability is in place. IMHO this is really one of the last steps to making Power BI a solution that your CISO should feel very comfortable with.

 

 

API Strategies with Power BI

API (Application Programming Interface). You surely have heard the term from your enterprise architects telling you that all data will be accessed through “the API” and you will no longer access data from the source. They will build all the interfaces for you and you won’t need that source connectivity anymore.

Tag-Cloud-Web-API

Enterprise architects are really smart (I used to be one), ha. In all of the wisdom we share as architects, the confession is that making data accessible to our business analysts was never our top priority… All of the design documentation that we wrote started with “end user experience” and had a lot of content around “3 Tier Architecture”. Reporting and Business intelligence usually ended up as the last few paragraphs after a few dozen sections of content on how the application would work and interface with all the back-end systems.

I am writing this to give the perspective that more often than not, data connectivity is not forefront in most application designs. I do believe that is starting to change as data is becoming a financial currency to most organizations, but nonetheless, these “pesky” APIs still stand in between you and the data you need to access.

So, you have been given a nice long URL that starts with “https://” to access the data you need. Now what?

In this article, I will show how to connect to your APIs through Power BI in a few different ways. I will start by showing the ODATA protocol but will quickly dive deep into APIs that are secured by the popular OAuth protocol and end with how awesome Power BI transverses nested JSON documents that get returned.

Connecting to ODATA Services

The most well suited API strategy for connecting business intelligence tools is the ODATA protocol. When your organization’s APIs are presented as an ODATA feed then it has a standard implied that power bi can understand. OData can be offered either through the more modern REST standard or more traditional and still widely used SOAP standard.

There is a public ODATA feed for the Northwind sample data set here:
http://services.odata.org/northwind/northwind.svc/

To connect to ODATA feeds, simply go to “Get Data” -> “ODATA” and enter the URL.

Get Data ODATA

As this is a standard format for APIs, the benefit is that you are presented with a table list from the service just like you would be if you were connecting directly to a database.

ODATA Table Navigator

 

Most APIs will have additional request parameters that can be added to get more specific data. The below url connects to the same service as above but returns only a single record:
http://services.odata.org/Northwind/Northwind.svc/Orders(10643)

ODATA to table

You have successfully just connected to your first API. From this point forward, ODATA behaves in the same way as other data sources and you can start shaping data in the “Edit Queries” window.

If you can influence your organization to adopt this standard for APIs, your life will be much easier consuming them. However, the purpose of this blog site is to help you use technologies such as Power BI in an enterprise scenario, and it is most likely that your APIs are not ODATA compliant…. So let’s get on with the FUN stuff.

Non ODATA APIs

After you ask your architect if the API you are trying to connect to is ODATA compliant and she tells you NO, the next alternative is to use the Web data source.

connect to web

When connecting to a web URL for your internal (or external API) it is likely going to be a secured and will have a prefix of “https” instead of “http”. In our ODATA example above, I did not connect to a secured service but had I, the below dialog would have been presented back.

connect to web 2

There are 5 different authentication methods that are quite easy to connect with. It is possible that internal APIs may use Basic or Windows credentials. Someone will have to provide to you a basic user/password combination for connecting to the service or if using Windows credentials someone will need to verify you have been granted access with your own user/password combination.

If connecting with one of these five authentication types, it is pretty straight forward and you can skip ahead to the end of this post where I show how to transverse the JSON documents that likely will be returned as a result.

Connecting to OAuth Services

Many organizations are moving towards OAuth authentication protocol. Notice in the screenshot above, this protocol is not natively supported in Power BI. This protocol has an extra layer of complexity as it requires two services. One service will be called with an api key and an api secret and will return a token to then be used with all subsequent requests to the resource URL.

We will get in the weeds rather quickly in the rest of this blog post so prior to the half of you that will stop reading at this point, please go to the Power BI Ideas site and VOTE for OAuth protocol to be added as a first class authentication provider. Here is the link: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13195278-oauth

Also, the below content is available in a Power BI Template file here. Read the content below but feel free to download the template to save a lot of the grunt work.

Setting up a Twitter App for our OAuth API Example

First of all, twitter connectivity via Power Query has already been blogged about a few times. The most useful post I found on it was by Chris Koester here. The script I am using later in this article is nearly identical to what I learned from his post. I will talk through the necessary script that has to be written but also focus on how you can transverse JSON data that gets returned from most modern APIs you will encounter.

Twitter uses OAuth2 for authentication. This will require you to setup a free developer account if you have not already.

Go to https://dev.twitter.com and sign in with your twitter account. You will then need to create a twitter application which will give us the necessary API Key and API Secret that will be used to authenticate with the twitter Search API.

In the new Twitter UI, the applications section is a little hard to find. At the bottom of https://dev.twitter.com are some links. On the right hand side you will see “Manage Your Apps”

Twitter UI

You will see a “Create New App” button that will take you through a dialog to setup your application

Callback URL issue with OAuth based APIs

In the application creation dialog, you will notice a Callback URL entry that we will leave blank for this example.

callback url issue

In a traditional OAuth flow that has a UI associated with it, this is the URL that the flow will return user interaction back to after the user agrees to allow the application to act on their behalf. This blog is not meant to go into that detail but you can review this link for more information.

As with OAuth, we have to get an authentication token from the first service we are going to call, there is an issue with some apis as they only pass this token back to the Callback URL. Power BI cannot read a response returned to a callback URL and if the API you are using requires that “authentication flow” then it can’t be used with Power BI (well, that is not entirely true, but to prevent this blog from being 50 pages long, we will say it can’t). For instance, the LinkedIn API requires this flow so it is not something we can connect to from Power BI easily.

As long as the token is returned in the response from the first authentication service call (as the Twitter API does), we will be able to integrate it with Power BI.

Connecting to the Twitter Search API

Once you have entered the data to create an application, you will land on your “App details” screen. Navigate to the “Keys and Access Tokens” tab and copy the API Key and Secret so they can be used later.

Note: These should be treated like passwords and not made public

twitter access tokens

Back in Power BI, go back to the Web data source and we will just type in an arbitrary web URL as we will be replacing all of this with a script. I used http://twitter.com and was presented with the navigator dialog to select a table from that page. Click “Edit” to open the Query Editor.

connect to arbitrary table

It is a good time to take advantage of a new feature that was added in the April Power BI Desktop update for Parameters. For deeper insight on this feature, check out the power bi blog entry.

manage parameters

Create 4 new parameters as follows:
API KEY: <<your API Key>>
API Secret: <<your API Secret>>
Token URL: https://api.twitter.com/oauth2/token
Search URL: https://api.twitter.com/1.1/search/tweets.json?q=Analytics&count=100

Notice the Search URL has a couple of query string parameters. In my example we will be searching on the word “Analytics”

Once the parameters are entered you will see them in the queries list with their values along with the arbitrary table you pulled in from the Web data source

query editor parameters

 

Advanced Editor

Now that we have our parameters in place, navigate back to the Table Query and click on “Advanced Editor”

advanced editor

Once in the advanced editor, replace the script with the following:

In the above code, there are two calls to Web.Contents function. The first is to call the token service to get a token returned to then be used in the second service call to actually get results.

OAuth is a preferred authentication method as the token that is returned from the first service is only valid for a finite period of time. So if you were to manually reproduce this token and then just called the search service with it, it would no longer be valid the next time you tried to refresh this data.

That is why the script above is so powerful as it handles the token generation so that you have a repeatable automated process. Kudos to @cjkoester for putting this script together.

A couple other things to note in the script is the use of our parameters. They are prefixed with the “#” sign. Also, not all OAuth services will follow the same signature above but this can be used as an example and ideally your development team can tell you changes you may need to make in the pattern to work.

I renamed the query to “TwitterSearch”. The result should now look similar to this.

query editor twitter search start

 

Working with the JSON response

JSON (javascript object notation) is the most popular format used for transmitting data for modern APIs. XML is also still widely used and if your APIs return XML it is likely many of these same steps will apply.

I am really impressed with the JSON parser that is built into the query editor of Power BI. For instance, when you look at the results of the JSON that gets returned in the Twitter API that we just connected to, it looks like this:

JSON via Fiddler

I can count 6 levels of nesting in the small portion of the JSON response I am showing above (via Fiddler). The Power BI Query Editor handles this with no issues.

We will create two separate queries from this response, one for the statuses and the other for the hashtags.

Taking the TwitterSearch result, the two top level entities are “statuses” and “search_metadata”. Click the “List” value for “statuses” and this will expand the record set for “statuses”.

query editor to table

Notice the “To Table” option at the top, by changing this list to a table, we will be able to expand the Record to all the values. Click “Ok” in the To Table dialog and you will see the magic expand button pop up in the column header.

query editor expand button

Click the button and then uncheck the “Use original column name as prefix” to avoid the “Column1” showing up as a prefix for to every column created.

query editor choose columns 1

Click Ok.

Congratulations! You now have the last 100 statuses for the search term “Analytics” that we used in the search URL string we entered as the parameter above: https://api.twitter.com/1.1/search/tweets.json?q=Analytics&count=100

Let’s use this same result to create an additional query for hashtags. Simply right click on the “TwitterSearch” query and select “Duplicate”

query editor duplicate

Rename the query “Hashtags”.

Find the “Entities” column and click the magic expand button again.

query editor expand entities

On the column expansion dialog simply click “Ok” again to see all the results of the nested JSON document.

query editor expand hashtags

Click the magic expand button again on “hashtags”.

This will bring in the null values and the “List” links will become “Record” links.

Finally click the magic expand button one last time to be greeted by the hashtag records we wanted.

query editor hashtags expanded

Right click “text.1” and rename it “hashtag”.

Now we can use the “Choose Columns” dialog to only select a few of the columns to reduce our query to just the information important to the hashtags.

query editor choose columns

From here you can “Close and Apply” your Query Editor window and will have two queries. One for the statuses of your twitter search and the other for the hashtags that you can cross filter against.

A simple viz may look like this

viz

Power BI Template

Another new feature in the April update of Power BI is the Power BI Template file. I have saved all of the above work in a template file that I am sharing by clicking the Export option from the File menu.

template screen shot

This retains all of the work that has been completed above but doesn’t retain any of the data. So when you open this template file, it will prompt you for the parameters that were discussed above.

Here is the template file link again. Angry Analytics Twitter API Example

Conclusion

In closing, this article was intended to show you multiple ways that you can connect to APIs as it is likely your organization has set standards around data connectivity that no longer allows connecting directly to the source.

Be an advocate for a Data Culture. Putting data in the hands of more people as quickly as possible will result in the ability to achieve more. Challenge your development teams to make data accessible. Data is currency!

Power BI Content Workflow

I work with many enterprise level organizations in the Midwestern United States. There is a lot of excitement around Power BI, however I often run into challenges regarding features that are expected with a business intelligence solution that can meet the needs of an enterprise such as governance, administration, and lifecycle management.

pendulum

In Business Intelligence, there is a pendulum that swings between self-service and governance that wreaks havoc on most organizations of reputable size. I can confidently say there is little doubt from an end user feature set that Power BI competes well with any solution on the market. With a FREE and PRO offering at only 9.99 a month per user, the price point is hard to beat. But what about total cost of ownership (TCO)? What about the enterprise deployments of Power BI? I intend to use this blog to help with answering these questions regarding Power BI, SSAS Tabular models, and an overall agile approach to business intelligence within the enterprise.

Let’s start with discussing lifecycle management and taking reports through development, testing, and production similar to how you would deploy a codebase for an application.

power bi cheapskate

First of all, we need to ensure that we are working with the PRO version of Power BI. Yes, if you want Enterprise level features, you are going to have to pay for it. Two PRO features are being used below: Power BI Group Workspaces and Content Packs.

Group Workspaces in Power BI allow for multiple users to “co-author” reports and share ownership over the Power BI content. You can easily create a group workspace from the left hand navigation bar.

power bi groups

Once a group has been created, you can assign users with either “Edit” or “View Only” access. This will be an important distinction for users you assign to this group between those who will be “co-authoring” and those who will be testing and doing quality assurance.

power bi groups add user

 

Once a group has been established with distinction between the content developers and the testers, the below diagram can be followed to achieve a content management workflow.

power bi content mgmt workflow

This diagram is also provided in pdf form below:
Power BI Content Workflow

In this workflow, One Drive For Business (ODFB) is used as a collaboration space for PBIX files. Alternatively, this could as be a source control environment or a local file share. A benefit in O365 deployments for Power BI is that when a group is created, a ODFB directory is also established for sharing.

The group workspace is used for collaboration between content authors and testers. Once development and testing iterations are complete, a Content Pack should be produced to distribute the production version of the dashboard, reports, and datasets to the rest of the organization. Content Packs are created by going to “Get Data” -> “My Organization” -> “Create Content Pack”.

power bi content pack create

IMPORTANT: It is critical that in an enterprise scenario that personal workspaces are not used to publish content packs. This creates a dependency on a single author to have full control over content and scenarios will consistently pop up that a single author may not be available when updates are needed on the content. Enterprise deployment scenarios can’t be achieved when content is tied to a personal workspace. Establish a group workspace structure that makes sense for your organization as pointed out in the Pre-Requisite text on the workflow diagram.

Making Revisions

As revisions occur on the content in the group workspace a notification appears in the upper right reminding that there is a content pack associated with the dashboard or report being modified and that an update needs to be applied to the content pack to publish the changes.

power bi content pack changes

I have talked with some organizations that originally considered bypassing the use of content packs and just were going to use group workspaces with “View Only” users getting the content distribution. This would not allow for the option to iterate on changes before they are published to the end users.

Once the iterations have been completed and the next revision is ready to publish, anyone with “Edit” privileges in the group can go to Settings (the gear icon in upper right) and “View Content Packs”.

power bi content pack update

A warning icon is displayed next to the content pack name indicating there are pending changes. Once you click “Edit” you will have the option to “Update” the content pack in which the changes will be distributed to those that have access to it.

power bi content pack update button

Connection Strings and Separation of Duties (SoD)

The above workflow still leaves a couple of desired features without answers.

With the group workspaces, we have separated the ability to develop and test report changes from publishing them to the rest of the organization, but how do we transition from a DEV/TEST data source to a PROD data source? This has to be achieved within the Power BI Desktop file (as of the time this blog post was written). If it is a new calculation or set of tables that are being tested, collaboration will have to occur between the development team and the testers as to when to change the connection strings from DEV/TEST to PROD sources and republish the PBIX file.

In my experience, often on the reporting and analytics side of things, production connectivity is required by the “power users” in each department that are doing the quality assurance as they can only tell if new report features will pass the “sniff test” when they are connected to data that they know. DEV/TEST environments often grow stale and don’t hold as much value on the reporting/analytics side as they do for application development. Connection string management will be enhanced in Power BI as the product matures, but even then, limited value will be achieved without the production data to verify changes.

Lastly, when we look at the fact that a group workspace only has an “Edit” and “View” role and lacks a “Publish” role, it may be of concern that a content author can push changes for content packs without separation of duty (which may require someone else to actually publish the change). If this is very critical to your business, then the separation of duty can be achieved by adding another group workspace to the mix that only publishers can have access to, but I would suggest this is often overkill. Because reporting cannot “alter” any underlying data, it does not as often have as strict of controls in place to protect from malicious code being deployed such as a transactional application may have. And remember, there is a significant amount of any enterprise organization that is still running their business from Excel spreadsheets being produced and managed by the “power users” in each department. They simply extract data from source systems and create Excel reports that often go all the way to the CFO or CEO in the organization. So as we come back to the initial comment in this blog regarding the pendulum of self-service vs governance, remember that these concerns probably already exist in your organization today with much less visibility than they would when using the proposed workflow mentioned above.

 

Feel free to leave any comments on this article or suggestions for future articles that touch on the use of Microsoft BI solutions such as Power BI for enterprise deployment.