Split multi valued cells in more than one column into rows (Open Refine)
Asked Answered
I

2

6

I have been cleaning a table on Open Refine. I now have it like this:

REF                 Handle      Size        Price
2002, 2003          t-shirt1    M, L        23
3001, 3002, 3003    t-shirt2    S, M, L     24

I need to split those multivalued cells in REF and Size so that I get:

REF                 Handle      Size        Price
2002                t-shirt1    M           23
2003                t-shirt1    L           23  
3001                t-shirt2    S           24  
3002                t-shirt2    M           24
3003                t-shirt2    L           24

Is it possible to do this in Open Refine? The "Split multi-valued cells..." command only takes care of one column. Thank you, Ana Rita

Ikhnaton answered 28/9, 2015 at 20:49 Comment(0)
H
4

Yes, it's possible :

  • Split the 1st column using ", " as separator.
  • Move column 2 at position one
  • display your project as record (not row)
  • Split column 3 using ", " as separator
  • Fill down columns 4 and 2
  • reorder columns

Here's my recipe in GREL :

[
  {
    "op": "core/row-removal",
    "description": "Remove rows",
    "engineConfig": {
      "facets": [
        {
          "invert": false,
          "expression": "row.starred",
          "selectError": false,
          "omitError": false,
          "selectBlank": false,
          "name": "Starred Rows",
          "omitBlank": false,
          "columnName": "",
          "type": "list",
          "selection": [
            {
              "v": {
                "v": true,
                "l": "true"
              }
            }
          ]
        }
      ],
      "mode": "row-based"
    }
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 1",
    "columnName": "Column 1",
    "keyColumnName": "Column 1",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/column-move",
    "description": "Move column Column 2 to position 0",
    "columnName": "Column 2",
    "index": 0
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 3",
    "columnName": "Column 3",
    "keyColumnName": "Column 2",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 4",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 4"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 2",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 2"
  },
  {
    "op": "core/column-reorder",
    "description": "Reorder columns",
    "columnNames": [
      "Column 1",
      "Column 2",
      "Column 3",
      "Column 4"
    ]
  }
]

Hervé

Hazzard answered 28/9, 2015 at 22:8 Comment(2)
You're a genius! :) Thank you so much.Ikhnaton
can you add ` around your GREL recipe to turn it into code formatting?Margrettmarguerie
W
0

Just found a nice, free OpenRefine plugin that offers "Unpaired pivot": VIB-Bits plugin

From their documentation:

3.2.1 Unpaired pivot... Unpaired pivot is the transformation of data that is organized in rows to a representation of that data in separate columns. A simple example would be transforming

Category Value
a 1
a 2
b 3
c 2

into

Value a Value b Value c
1 3 2
2
Wheelwright answered 5/4, 2021 at 21:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.