Normalize (reformat) cross-tab data for Tableau without using Excel
Asked Answered
Z

3

13

Tableau generally works best when input data is in "normalized" format, rather than cross-tab. This is also referred to as converting from "wide format" to "long format".

That is, converting from:

enter image description here

To:

enter image description here

Tableau provides a "reshaping tool" for Excel users, but if you don't have Excel, you're stuck.

So how can you get a spreadsheet into this format, without using Excel?

Zerk answered 6/1, 2015 at 22:25 Comment(0)
Z
5

Well, you can use this handy Google Sheets script I made.

enter image description here

/*
normalizeCrossTab: Converts crosstab format to normalized form. Given columns abcDE, the user puts the cursor somewhere in column D.
The result is a new sheet, NormalizedResult, like this:

a     b     c    Field Value
a1    b1    c1   D     D1
a1    b1    c1   E     E1
a2    b2    c2   D     D2
a2    b2    c2   E     E2
...

Author: 
Steve Bennett
[email protected]
@stevage1

Licence: Public Domain

*/

function start() {
  var html = HtmlService.createHtmlOutput(
    '<style>ol { padding-left: 1.5em; }</style>' + 
    '<script src="//code.jquery.com/jquery-1.11.2.min.js"></script>' +
    '<script>' + 
    'function allDone(msg) { ' +
    '  $("#normalizeBtn").hide();' +
    '  $("#datacols-output").html("<p>Your normalized data is in a sheet called NormalizedResult. If you run the normalization again, that sheet will be deleted and replaced.</p>");' +
    '};' +
    'function gotCols(cols) { ' + 
    '  $("#datacols-output").html(\'<p>These will be your dependent variables:</p><ul id="datacols"></ul>\'); ' + 
    '  $("#normalizeBtn").show();' +
    '  $.each(cols, function() {' + 
    '    $("#datacols").append($("<li>").text(this)); ' + 
    '  });' + 
    '  $("#datacols").after("<p>If they don\'t look right, move the cursor and press <i>Continue</i>.</p>"); ' + 
    '}' + 
    '</script>' + 
    '<link rel="stylesheet" href="https://ssl.gstatic.com/docs/script/css/add-ons.css">' +
    '<h2>Convert cross-tab</h2>'+
    '<p>This macro converts <i>cross-tab</i> data which has multiple dependent variables (eg, observations, sales figures) per row into a <i>normalized</i> format with one dependent variable per row.</p>' + 
    '<p>The name of each dependent variable becomes the value of a new column called <code>Field</code> and its value goes in a column called <code>Value</code>.</p>' +
    '<ol><li>Move <b>all independent variable columns to the left</b></li>' + 
    '    <li>Place the <b>cursor in the first dependent variable column</li></ol>'+
    '<p><button onClick="google.script.run.withSuccessHandler(gotCols).getDataColumns();">Continue</button></p>' + 
    '<p id="datacols-output"></p>' +
    '<p><button id="normalizeBtn" class="create" style="display:none;" onClick="google.script.run.withSuccessHandler(allDone).normalizeCrosstab(true);">Normalize</button></p>' + 
  '<br/><p><a target="_blank" href="http://kb.tableausoftware.com/articles/knowledgebase/denormalize-data">More information</a></p>')
      .setSandboxMode(HtmlService.SandboxMode.IFRAME)
      .setTitle('Normalize cross-tab')
      .setWidth(300);
  SpreadsheetApp.getUi() // Or DocumentApp or FormApp.
      .showSidebar(html);
}

function onOpen() {
  var ss = SpreadsheetApp.getActive();
  var items = [
    {name: 'Normalize Crosstab', functionName: 'start'},
  ];
  ss.addMenu('Normalize', items);
}


function normalizeCrosstab(really) {
  if (!really) {
    return start();
  }
  var sheet = SpreadsheetApp.getActiveSheet(); 
  var rows = sheet.getDataRange();
  var numRows = rows.getNumRows();
  var values = rows.getValues();
  var firstDataCol = SpreadsheetApp.getActiveRange().getColumn();
  var dataCols = values[0].slice(firstDataCol-1);

  var resultssheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("NormalizedResult");
  if (resultssheet != null) {
    SpreadsheetApp.getActive().deleteSheet(resultssheet);
  }
  var header = values[0].slice(0, firstDataCol - 1);

  var newRows = [];

  header.push("Field");
  header.push("Value");
  newRows.push(header);

  for (var i = 1; i <= numRows - 1; i++) {
    var row = values[i];
    for (var datacol = 0; datacol < dataCols.length; datacol ++) {
      newRow = row.slice(0, firstDataCol - 1); // copy repeating portion of each row
      newRow.push(values[0][firstDataCol - 1 + datacol]); // field name
      newRow.push(values[i][firstDataCol - 1 + datacol]); // field value
      newRows.push(newRow);
    }
  }
  var newSheet = SpreadsheetApp.getActiveSpreadsheet().insertSheet("NormalizedResult");
  var r = newSheet.getRange(1,1,newRows.length, header.length);
  r.setValues(newRows);
};

function getDataColumns() {  
  var sheet = SpreadsheetApp.getActiveSheet(); 
  var rows = sheet.getDataRange();
  var values = rows.getValues();
  var firstDataCol = SpreadsheetApp.getActiveRange().getColumn();
  var dataCols = values[0].slice(firstDataCol-1);
  return dataCols;

}

Full write-up with instructions on how to install.

Zerk answered 6/1, 2015 at 22:25 Comment(0)
M
3

Newer versions of Tableau (9.0 and above) allow reshaping of data on import. It's as simple as selecting the columns you want to stack. Here's a useful tutorial.

In 2018, Tableau released Tableau Prep Builder for reshaping and wrangling Data for analysis.

Some other commercial tools useful for reshaping and cleaning data worth investigating are:

Trifacta was created by some of the people who worked on the previous academic DataWrangler project. I've been told that Alteryx is good for some prep tasks for geospatial related data, and have met some enthusiastic Paxata users.

I have too little hands on experience with them to give more than a reference to their web sites, tending to use Python scripts instead to date.

If you want to "fold" columns that you want to merge into a single column. Tableau wrote a helpful tutorial here. Tableau 9 introduced some useful reshaping features for unpivoting tables and splitting columns.

Menefee answered 6/2, 2015 at 22:35 Comment(2)
Huh, thanks. I googled long and hard and never found that. Too bad the software is no longer developed.Zerk
FYI, Steve's comment refers to the DataWrangler project, which has been replaced by the COTS package Trifecta and its competitors aboveCacophonous
C
1

If you are comfortable with command lines and the Unix style of combining small tools using pipes, check out the open source csvkit toolsuite.

You can combine these utilities in many ways to get different effects, so the exact sequence of steps depends on your datasets (that's the point of the tools after all).

But for reshaping tasks, you could use csvcut to pull out columns of interest, csvgrep to pull out rows of interest and cvsstack to combine multiple csv files into one longer one, along with the -g and -n options to add a grouping field.

There are several other useful commands, and if you are familiar with Unix or linux, you can figure out what they do quickly just from the command names.

Cacophonous answered 27/3, 2015 at 16:57 Comment(5)
Good to know about, but I don't think "check out this tool" really qualifies as an answer. Does CSVKit specifically solve this problem? If so, how? Which sequence of tool steps?Zerk
I edited the answer to at least partially point out how you can use csvkit for reshaping tasks. The documentation is pretty clear and succinct as well.Cacophonous
I'm still not seeing how this combination of commands solves the problem. csvstack combines multiple files (which I don't have), and the -g flag just adds an extra column with a value per file. Are you suggesting splitting up the file with csvcut/csvgrep into one file per field/value combination, and then rejoining them all with csvstack?Zerk
That's one way to attack the problem if you have just a few groups to work with. Or build a script that does that and hides the temp files. Or instead, use csvpy3 to load it into python3 and interactively reshape as desired. Not saying it's the answer to every reshaping problem, but the Unix pipe/small tool approach gives you the flexibility to incrementally attack a wide variety of problems with very little code.Cacophonous
Yeah, the csvkit tools are definitely nice - even just the ability to select a few columns from a big file is very handy at times.Zerk

© 2022 - 2024 — McMap. All rights reserved.