Sort associative array with AWK
Asked Answered
R

6

18

Here's my array (gawk script) :

myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11

After sort, I need the following result :

bob    5
jack   11
peter  32
john   463

When i use "asort", indices are lost. How to sort by array value without losing indices ? (I need ordered indices based on their values)

(I need to obtain this result with awk/gawk only, not shell script, perl, etc)

If my post isn't clear enough, here is an other post explaining the same issue : http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_26626841.html )

Thanks in advance

Update :

Thanks to you both, but i need to sort by values, not indices (i want ordered indices according to their values).

In other terms, i need this result :

bob    5
jack   11
peter  32
john   463

not :

bob 5
jack 11
john 463
peter 32

(I agree, my example is confusing, the chosen values are pretty bad)

From the code of Catcall, I wrote a quick implementation that works, but it's rather ugly (I concatenate keys & values before sort and split during comparison). Here's what it looks like :

function qsort(A, left, right,   i, last) {
  if (left >= right)
    return
  swap(A, left, left+int((right-left+1)*rand()))
  last = left
  for (i = left+1; i <= right; i++)
    if (getPart(A[i], "value") < getPart(A[left], "value"))
      swap(A, ++last, i)
  swap(A, left, last)
  qsort(A, left, last-1)
  qsort(A, last+1, right)
}

function swap(A, i, j,   t) {
  t = A[i]; A[i] = A[j]; A[j] = t
}

function getPart(str, part) {
  if (part == "key")
    return substr(str, 1, index(str, "#")-1)
  if (part == "value")
    return substr(str, index(str, "#")+1, length(str))+0
  return
}

BEGIN {  }
      {  }
END {

  myArray["peter"] = 32
  myArray["bob"] = 5
  myArray["john"] = 463
  myArray["jack"] = 11

  for (key in myArray)
    sortvalues[j++] = key "#" myArray[key]

  qsort(sortvalues, 0, length(myArray));

  for (i = 1; i <= length(myArray); i++)
    print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}

Of course I'm interested if you have something more clean...

Thanks for your time

Ritualist answered 17/3, 2011 at 17:25 Comment(0)
P
31

Edit:

Sort by values

Oh! To sort the values, it's a bit of a kludge, but you can create a temporary array using a concatenation of the values and the indices of the original array as indices in the new array. Then you can asorti() the temporary array and split the concatenated values back into indices and values. If you can't follow that convoluted description, the code is much easier to understand. It's also very short.

# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
    split(tmpidx[i], tmp, SUBSEP)
    indices[++j] = tmp[2]  # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]

Edit 2:

If you have GAWK 4, you can traverse the array by order of values without performing an explicit sort:

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    PROCINFO["sorted_in"] = "@val_num_asc"

    for (i in myArray) {
        {print i, myArray[i]}}
    }

 }

There are settings for traversing by index or value, ascending or descending and other options. You can also specify a custom function.

Previous answer:

Sort by indices

If you have an AWK, such as gawk 3.1.2 or greater, which supports asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    num = asorti(myArray, indices)
    for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}

If you don't have asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    for (i in myArray) indices[++j] = i
    num = asort(indices)
    for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}
Paget answered 17/3, 2011 at 20:46 Comment(7)
@Phil: Please see my edited answer. I'm sorry I didn't read your question carefully.Paget
for your updated answer, I think the convention is to use awk's pseudo multidimensional arrays a[myArray[i], i] and then split() on SUBSEP as seen in one of my answers here --> #4444083 Still a kludge to be sure but a little less so IMHOUniformitarian
@SiegeX: In this case, since it's a numeric sort, I needed to use sprintf() anyway, but SUBSEP would be a better character to use than space. I will update my answer.Paget
excellent, I had not thought to use right aligned sprintf to sort numbers directly with asorti! this saves me using a custom sort method. Thank you allRitualist
Is the numeric sort behavior for right-justified numeric strings documented somewhere?Grazynagreabe
@haridsv: No, not as such. Sorting in GAWK is lexical, not numeric, and by right justifying the numbers it forces the sort to behave as if it were numeric. Note that GAWK 4 now has a facility to control array traversal order PROCINFO["sorted_in"] ="@ind_num_asc" for example, without needing to explicitly sort the array.Paget
@Grazynagreabe : if the "numeric" indices are already right aligned, then it's probably slightly faster for gawk if u use PROCINFO["sorted_in"] ="@ind_str_asc" instead, since it skips the step of having to first convert it back to numeric before comparingAccrete
S
6

Use the Unix sort command with the pipe, keeps Awk code simple and follow Unix philosophy
Create a input file with values seperated by comma
peter,32
jack,11
john,463
bob,5

Create a sort.awk file with the code

BEGIN { FS=","; }
{
    myArray[$1]=$2;
}
END {
    for (name in myArray)
        printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}

Run the program, should give you the output
$ awk -f sort.awk data
bob,5
jack,11
peter,32
john,463

Shier answered 7/1, 2013 at 20:34 Comment(1)
@Sai Kadam : since you're sorting externally, why use an array at all ? just do the printf( ) at each rowAccrete
D
5
PROCINFO["sorted_in"] = "@val_num_desc";

Before iterating an array, use the above statement. But, it works in awk version 4.0.1. It does not work in awk version 3.1.7.

I am not sure in which intermediate version, it got introduced.

Depart answered 12/3, 2015 at 8:46 Comment(2)
This could be described a bit better, but GAWK lets you tell it how you want arrays to be sorted when you say for (a in arr) { ... }. gnu.org/software/gawk/manual/html_node/…Ungovernable
Just remember to call gawk and not just awk. This is the biggest for me to use gawk; I can afford not to care about non-gawk in our environment.Ungovernable
B
1

And the simple answer...

function sort_by_myArray(i1, v1, i2, v2) {
    return myArray[i2] < myArray[i1];
}

BEGIN {
    myArray["peter"] = 32;
    myArray["bob"] = 5;
    myArray["john"] = 463;
    myArray["jack"] = 11;
    len = length(myArray);

    asorti(myArray, k, "sort_by_myArray");

    # Print result.
    for(n = 1; n <= len; ++n) {
            print k[n], myArray[k[n]]
    }
}
Bronnie answered 2/10, 2016 at 20:42 Comment(3)
Can't you do return v2<v1 instead of return myArray[i2] < myArray[i1]? The function is called with the index and value pairs of the array.Sorci
And I think this should be return (v1-v2) for arithmetic ascending order; return (v2-v1) for arithmetic descending orderSorci
@Sorci Did you test the result of that? My version works :pBronnie
D
0

Use asorti:

#!/usr/bin/env -S gawk -f
{
    score[$1] = $0;
    array[sprintf("%3s",$2) $1] = $1;
}

END {
    asorti(array, b)
    for(i in b)
    {
        name = array[b[i]]
        print score[name]
    }
}
Detonation answered 12/12, 2020 at 9:15 Comment(0)
T
0

The following function works in Gawk 3.1.7 and doesn't resort to any of the workarounds described above (no external sort command, no array subscripts, etc.) It's just a basic implementation of the insertion sort algorithm adapted for associative arrays.

You pass it an associative array to sort on the values and an empty array to populate with the corresponding keys.

myArray["peter"] = 32;
myArray["bob"] = 5;
myArray["john"] = 463;
myArray["jack"] = 11;

len = resort( myArray, result );

for( i = 1; i <= len; i++ ) {
    key = result[ i ];
    print i ": " key " = " myArray[ key ];
}

Here is the implementation:

function resort( data, list,
        key, len, i, j, v )
{
        len = 0;
        for( key in data ) {
                list[ ++len ] = key;
        }

        # insertion sort algorithm adapted for
        # one-based associative arrays in gawk
        for( i = 2; i <= len; i++ )
        {
                v = list[ i ];
                for( j = i - 1; j >= 1; j-- )
                {
                        if( data[ list[ j ] ] <= data[ v ] )
                                break;
                        list[ j + 1 ] = list[ j ];
                }
                list[ j + 1 ] = v;
        }

        return len;
}

You can "reverse sort" by simply iterating the resulting array in descending order.

Torrens answered 26/12, 2022 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.