Combine duplicate items and sum up values – within a single string/cell
Asked Answered
S

4

12

I have this string in one single cell (A1):

Apple – 1
Banana – 6
Cherry – 10
Peach – 100
Apple – 1000
Kiwi – 10999
Peach – 44
Fig – 3/100
Fig – 3/100

I need to find a formula that can combine duplicate entries and sum up their values. So the desired output is this (also in a single cell, lets say B1):

Apple – 1001
Banana – 6
Cherry – 10
Peach – 144
Kiwi – 10999
Fig – 6/100

I tried this for hours but cant figure out how to do it. What I tried is to first find unique values. So far, no problem:

=UNIQUE(
    ARRAYFORMULA(
        REGEXEXTRACT(
            TRANSPOSE(SPLIT(A1, CHAR(10))),
            "(.*) – [0-9.]+/?[0-9]?"
        )
    )
)

returns the array of unique values

{"Apple", "Banana", "Cherry", "Peach", "Kiwi", "Fig"}

The best result I could come up with is this formula, to work with Apple and Banana manually:

=ARRAYFORMULA(
    {
        "Apple" & " – " & SUM(
            ARRAYFORMULA(
                IF(
                    REGEXMATCH(
                        TRANSPOSE(SPLIT(A1, CHAR(10))),
                        "Apple" & " – [0-9.]+/?[0-9]?"
                    ),
                    VALUE(
                        REGEXEXTRACT(
                            TRANSPOSE(SPLIT(A1, CHAR(10))),
                            "Apple – ([0-9.]+)/?[0-9]?"
                        )
                    ),
                    ""
                )
            )
        );
        "Banana" & " – " & SUM(
            ARRAYFORMULA(
                IF(
                    REGEXMATCH(
                        TRANSPOSE(SPLIT(A1, CHAR(10))),
                        "Banana" & " – [0-9.]+/?[0-9]?"
                    ),
                    VALUE(
                        REGEXEXTRACT(
                            TRANSPOSE(SPLIT(A1, CHAR(10))),
                            "Banana – ([0-9.]+)/?[0-9]?"
                        )
                    ),
                    ""
                )
            )
        )
    }
)

this correctly outputs the array

{"Apple – 1001", "Banana – 6"}

But no matter what I try, I fail to get it to work dynamically for all the unique elements. I feel like I figured out both pieces to the puzzle but I just can't put it together

Sensitize answered 28/2, 2024 at 0:2 Comment(4)
So the desired output is this: Is this also in a single cell?Tonguelash
preferably yes. it should be fairly simple to convert it from array to a single cell though (something like TEXTJOIN(CHAR(10), TRUE, [array here])), so if you can figure out how to make the arrayformula, that would probably be good enough tooSensitize
Sorry for any confusion. I wasn't offering a solution, I was just trying to clarify the question. It would be helpful if you were to edit the question to make it clear that the proposed output is to contained in a single cell.Tonguelash
aye aye, will do!Sensitize
T
8

Here's a possible solution:

=ARRAYFORMULA(
  LET(data,WRAPROWS(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"/","ζ")," –"&CHAR(10)),"ζ","/"),2),
      items,INDEX(data,,1),
      qty,MAP(INDEX(data,,2),LAMBDA(n,SORTN(QUERY(,"select "&n)))),
      JOIN(CHAR(10),MAP(UNIQUE(items),LAMBDA(item,item&" – "&SUM((item=items)*qty))))))

enter image description here

The data array formats the data in a way we can work with:

WRAPROWS(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"/","ζ")," –"&CHAR(10)),"ζ","/"),2)

The qty array converts the numbers to their non-fractional representation:

MAP(INDEX(data,,2),LAMBDA(n,SORTN(QUERY(,"select "&n))))

Then we iterate over each unique fruit using the MAP function:

MAP(UNIQUE(items),LAMBDA(item,

And for each fruit we perform a conditional sum where the criteria range is the array of all the fruits, the criterion is the current fruit and the sum range is the array of quantities associated to each fruit:

SUM((item=items)*qty)

Finally, we join everything together:

JOIN(CHAR(10),...)

Alternative representations of the fractional quantities

Reduced fractions

=ARRAYFORMULA(
  LET(data,WRAPROWS(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"/","ζ")," –"&CHAR(10)),"ζ","/"),2),
      items,INDEX(data,,1),
      qty,INDEX(data,,2),
      JOIN(CHAR(10),
        MAP(UNIQUE(items),LAMBDA(item,    
          LET(qty_,FILTER(qty,items=item),
              s,SPLIT(TOCOL(REGEXREPLACE(qty_,"^[^/]+$","$0/1")),"/"),
              nums,INDEX(s,,1),dens,INDEX(s,,2),l,LCM(dens),nnum,SUM(l/dens*nums),g,GCD(nnum,l),
              item&" – "&IF(COUNTIF(qty_,"*/*")=0,SUM(--qty_),nnum/g&"/"&l/g)))))))

enter image description here

Unreduced fractions

=ARRAYFORMULA(
  LET(data,WRAPROWS(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"/","ζ")," –"&CHAR(10)),"ζ","/"),2),
      items,INDEX(data,,1),
      qty,INDEX(data,,2),
      JOIN(CHAR(10),
        MAP(UNIQUE(items),LAMBDA(item,    
          LET(qty_,FILTER(qty,items=item),
              s,SPLIT(TOCOL(REGEXREPLACE(qty_,"^[^/]+$","$0/1")),"/"),
              nums,INDEX(s,,1),dens,INDEX(s,,2),l,LCM(dens),
              item&" – "&IF(COUNTIF(qty_,"*/*")=0,SUM(--qty_),SUM(l/dens*nums)&"/"&l)))))))

enter image description here

Twentytwo answered 28/2, 2024 at 4:15 Comment(3)
Does not work when there are both fractions and non-fractions for a fruit. For example, make the first entry "Apple - 1/4" and the formula won't produce the correct result. (I do appreciate the answer. While I can't upvote right now due to not having verified a correct answer, I would love to learn from your approach and should be back to upvote it later.)Turnbull
Allow me frame: the issue is not mixing fraction and non-fraction. The issue is "fraction looking" numbers can be interpreted as (Google Sheet) date (serial)Turnbull
Amazing! There seems to be an issue though when we add items to the menu that contain spaces, like if we add Apple Pie – 8 to the menu. I was able to resolve this by removing the space from the SPLIT parameter and adding a TRIM instead: WRAPROWS(TRIM(SUBSTITUTE(SPLIT(SUBSTITUTE(B2,"/","ζ"),"–"&CHAR(10)),"ζ","/")),2)Sensitize
T
5

This solution is intended to mimic "looping" over the input array (once).

To accomplish the above, we use reduce to store an aggregate 1-D array of the format {name1,val1,name2,val2,...} as we traverse the input array. Of course, we need to first use tocol(split(... to convert a single cell input into array. We will also need intermediate steps to identify matching names and to process values so they can sum readily.

At the end, we can process the final aggregate into anything we want. In the example below, I used makearray to reshape the single-row array into an n-by-1 array with concat as if reshape into n-by-2 array first and then concat column-wise. Here, makearray represents something general that you can always make work. Any particular transformation may have its own simpler or more performant approach.

The concept is simple. Though the code will be long because there are a lot of hstack, index(...,i,j) and choosecols(...,1,sequence(... calls for accessing and storing array elements. They make for a lot of clutter. Hopefully the spacing below helps make clear the flow of things.


First, make a named function to process potential fractions as follows

let(sepByVal,split(x,"/"),if(columns(sepByVal)>1,index(sepByVal,1,1)/index(sepByVal,1,2),sepByVal))

You can update this function if the formatting of your input changes.

Or if you are feeling fancy and wanting to keep integer fraction (as text), named function is your go-to place to make it happen.

enter image description here


Then, in B1, the following formula will produce the desired output.

=let(sep,"–"
    ,input,tocol(split(A1,char(10)))
    ,arr,reduce(hstack(index(regexextract(index(input,1,1),"[A-Za-z]+"),1,1),0)
               ,input
               ,lambda(a,c
                      ,let(sepByVal,regexextract(c,"([A-Za-z]+)[ "&sep&"]*([0-9]+[./0-9]*)")
                          ,ind,iferror(match(index(sepByVal,1,1),a,0),0)
                          ,if(ind
                             ,hstack(choosecols(a,sequence(1,ind))
                                    ,index(a,1,ind+1)
                                     +FRACTION2NUM(index(sepByVal,1,2))
                                    ,if(ind<columns(a)-2
                                     ,choosecols(a,sequence(1,columns(a)-ind-1,ind+2))
                                       ,tocol(,3)))
                             ,hstack(a,index(sepByVal,1,1)
                                 ,FRACTION2NUM(index(sepByVal,1,2)))))))
    ,join(char(10),makearray(columns(arr)/2,1,lambda(i,j,index(arr,1,2*i-1)&" "&sep&" "&index(arr,1,2*i)))))

Be sure to copy the symbol straight from your input.

Replacing the last line with

,out,wraprows(arr,2)
,join(char(10),arrayformula(index(out,,1)&" "&sep&" "&index(out,,2))))

might make the formula more familiar.

The formula above was intentionally made to be compatible with array input. You can replace the definition of input appropriately if you have an array input. (As shown in image below.)

But if you do have an array input, be sure to check blank inside lambda, basically if blank, then pass along a, else follow the rest of the logic (or to use filter outside reduce). This way you can work with infinite ranges.

Finally, a tip for how to compose this kind of formula yourself: when using reduce (on a column), use horizontal array as your aggregate. If you encounter any error, change the input range to A$1:A1 and drag it down. You will then visually see the intermediate aggregates. Any error will be easy to deduce as a result. This is also why it's better to convert single cell into array first and make formulas that work on array input.

enter image description here


Some philosophical thoughts... Why isn't makepivottable a built-in function...? Why isn't reshape (as a view if output is used inline) a built-in function...? Why isn't set(val,arr,i,j) a built-in function...?

Turnbull answered 28/2, 2024 at 5:41 Comment(0)
S
3

First split the data, into a proper 2D array and then use query to group the data:

=ARRAYFORMULA(QUERY(
  SPLIT(TOCOL(SPLIT(A1,CHAR(10))),"– "),
  "Select Col1,sum(Col2) group by Col1",0)
)

From here,

  • Fractions aren't converted to numbers and therefore not summed up by query. Evaluate isn't available in Google sheets, but you will be able to use my query trick to evaluate fractions into actual numbers

  • Data isn't joined up in a single cell. Using BYROW/TEXTJOIN should be trivial.

  • Decimals can be converted back to fractions using TEXT

=ARRAYFORMULA(
   LET(splitData, SPLIT(TOCOL(SPLIT(A1,CHAR(10))),"– "),
       numData,MAP(splitData,LAMBDA(a,IFERROR(--query(,"Select "&a&" label "&a&" ''",0),a))),
       groupedData, QUERY(numData,
          "Select Col1,sum(Col2) group by Col1 label sum(Col2) ''",0
         ),
       rowJointData, BYROW(groupedData, LAMBDA(r, JOIN("-", TEXT(r,"# #/100")))),
       TEXTJOIN(CHAR(10),,rowJointData)
      )
)
Sulk answered 28/2, 2024 at 4:22 Comment(0)
G
3

enter image description here

Formula in B1:

=INDEX(JOIN(CHAR(10),BYROW(LET(x,SPLIT(TOCOL(SPLIT(REGEXREPLACE(A1,"/\d+",),CHAR(10)))," – ",0),u,UNIQUE(INDEX(x,,1)),HSTACK(u,MAP(u,LAMBDA(y,SUM(FILTER(INDEX(x,,2),INDEX(x,,1)=y))&IFERROR(REGEXEXTRACT(A1,y&" – \d+(/\d+)"),))))),LAMBDA(r,JOIN(" – ",r)))))

This assumes that the input is consistant in a way that the amounts in your data are of the same type; thus either actual integers or portions of the same order!

However, if your goal was to sum all kind of values (thus including different types of arithmetic operators), try the following:

=INDEX(JOIN(CHAR(10),BYROW(LET(x,SPLIT(TOCOL(SPLIT(A1,CHAR(10)))," – ",0),QUERY({INDEX(x,,1),MAP(INDEX(x,,2),LAMBDA(y,QUERY(,"select "&y&" label "&y&" ''")))},"Select Col1, Sum(Col2) Group By Col1 Label Sum(Col2) ''")),LAMBDA(r,JOIN(" – ",r)))))

Guyer answered 28/2, 2024 at 12:20 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.