Operations on multiple tables / datasets with Edit Queries and R in Power BI
Asked Answered
O

1

14

I have two tables tbl_A and tbl_B in a Power BI file that I'd like to transform and analyze using the Run R Script functionality in Edit Queries.

enter image description here

This would include handling missing values and joining the tables. However, when starting R, it seems I'm only able to do operations on one table at a time. This is because the Run R Script functionality only imports data from the table that is active when you click the Run R Script button. This data is then stored in the dataset variable.

enter image description here

If this is correct, it seems to me that the practical use of R` in Power BI would be very limited. I know I could join the tables before I unleash R. That would be a feasible solution for a simple case like this, but certainly not for more complex data structures. Any suggestions on how to do operations on multiple tables with R in Power BI?

Odysseus answered 4/7, 2017 at 5:45 Comment(5)
Perhaps this helpsRedeemable
Thank you for the tip! I tried it out, but I had no luck with references to multiple datasets. And just as a comment to that particular forum, I think it's a bit strange to call that suggestion a solution.Odysseus
Judging by the number of views this post has gotten so far, I'm mostly talking to myself here, but I've finally managed to find a possible solution to this. There are a few quirks to it, but it works. I've written it up as an answer, but please contribute your own answer if you're able to do it a bit more elegantly. Hopefully the PowerBI team at Microsoft will make things a bit easier in the future.Odysseus
Please do post that as an answer. So that others will benefitRedeemable
Working on it right now.Odysseus
O
14

Short version:

In Edit Queries, when inserting an R script, just add [dataset = "Renamed Columns", dataset2 = tbl_A] in the Formula bar. In this case Renamed Columns refers to the state of your table (under APPLIED STEPS) where you're inserting your R script, and tbl_A refers to another table that is available to you. And check all your settings with regards to Privacy.


Long version

Following up on my comment, here is a solution that builds on suggestions from a business intelligence blog and contributions in the PowerBI forum:

First you'll have to edit a few settings. Go to Options and Settings | Options. Under Privacy, select Always ignore Privacy Level settings. On your own risk of course...

enter image description here

Now, go to Options and Settings | Data Source Settings. Select source and click Edit permissons. Set it to Public:

enter image description here


Now we're good to go:

I'm gonna go from scratch here since I don't know what quirks any other data loading method would trigger in PowerBI. I've got two separate Excel files, each containing one worksheet called tbl_A and tbl_B, respectively. The data for the two tables look like this:

tbl_A Data

Date        Price1  Price2
05.05.2016  23,615  24,775
04.05.2016  23,58   24,75
03.05.2016  0       24,35
02.05.2016  22,91   24,11
29.04.2016  22,93   24,24

tbl_A Screenshot

enter image description here

tbl_B Data

Date        Price3  Price4
02.06.2016  19,35   22,8
01.06.2016  19      22,35
31.05.2016  19,35   22,71
30.05.2016  15,5    21,85
27.05.2016  19,43   22,52

tbl_B Screenshot

enter image description here


In the main window in PowerBI, load tbl_A using Get Data:

enter image description here

Do the same thing with tbl_B so that you end up with two separate tables under the Fields menu:

enter image description here

Click Edit Queries under the Home tab and make sure that the Formula Bar is visible. If not, you can activate it under View:

enter image description here

Depending on how your tables are loaded, PowerBI will add a few steps in the process. Those steps are visible under Query Settings:

enter image description here

Among other things, PowerBI changes the data type of dates to, you guessed it, Date. This can trigger problems later. To avoid this, we can change the data type for date in both tables to Text:

enter image description here

After you've done this for both tables, make sure tbl_B is active, and have a look at the Query Settings. You'll se that a new step Changed Type has been added in the data loading process:

enter image description here

We're going to add another step in order to keep our up-coming R script as simple as possible. In that script we're going to join the tables using the rbind() function. This will trigger an error unless the column names in the different tables are the same. So go ahead and change the names in column B from Price3 and Price4 to Price1 and Price2, respectively:

enter image description here

Now, the Applied steps under Query settings should look like this:

enter image description here

The name of the last step is crucial since you're going to have to reference Renamed Columns (or whatever else you'd like to call it) when you write your R script. And finally we can do exactly that.

Under Transform, click Run R Script. As the picture below describes, the variable dataset will contain the original data for your script. In this case, it will be tbl_B in the form of a dataframe if tbl_B was the active table when you clicked Run R Script:

enter image description here

For now, leave the script as it is, click OK, and have a look at the formula bar:

enter image description here

The picture above tells us two important things. First, we can see that the process has gone smoothly so far and that we have an empty table. Second, we can see that dataset refers to tbl_B in the state that we left it after the step Renamed Columns. And this is the part that can be confusing if you've read about these things elsewhere. In the Formula bar, you can enter a second dataset by adding , dataset2=tbl_A, so that the formula now looks like this:

enter image description here

Hit Enter

Under Query Settings, you will now see that there's a new step where you can edit your R script:

enter image description here

Click it to get back into R and add this little snippet:

df_B <- dataset
df_A <- dataset2
df_temp <- rbind(df_A, df_B)

output <- df_temp

When you click OK, this is what you'll see:

enter image description here

Nevermind that the formula bar looks like a mess, just go ahead and click Table next to output.

This is it!!

enter image description here

Go to Home and click Close & Apply to get out of the Query Editor. Now you can inspect the output from your R script under Fields, or in the Data tab like in the picture below:

enter image description here

The end result will be a version of your original tbl_B with the columns from tbl_A added to it. Not too fancy, but now that you've combined two datasets in you R script you're able to unleash a bigger part of R to your work flow.

Odysseus answered 30/8, 2017 at 13:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.