sorting 5d array in c
Asked Answered
I

2

12

I am trying to figure out how to sort a multidimensional data (5 dimensions) in C. I know that using a 5d array is a solution that, from reading others posts on SO about this topic many folks find, if not altogether unethical, so aesthetically repugnant so as to provoke unceasing projectile vomiting...so I apologize in advance.

Essentially I have an incoming set of data to which I must apply a series of discrete algorithms. Each algorithm has a set of variables, and I need to calculate a ranking of the efficiency of each algorithm with every permutation of the variables possible. Ultimately, I need a list sorted by the best-to-worst performing algorithm. The whole calculation is dynamic, so what works best on one incoming piece of data is unlikely to be the best performer on another...so I can't eliminate any of the variables because they are poor performers.

Here is how the data looks:

dataValue[ algo ][ lengthVar ][ durationVar ][ plasticityVar ] [ fungibilityVar]

There are:

  • 35 algorithms
  • 10 length variables
  • 230 duration vars
  • 27 plasticity vars
  • 400 fungibility vars

In addition to sorting by algorithm, I would like to have the flexibility to sort on any of the 5 dimensions.

This will be run on a 12 physical/ 24 logical core machine with 192 gig (not meg) of RAM, using VS 2010 C (not C++).

I am assuming that qsort would be the most efficient sorting option. I have searched Google and SO extensively for how to do this to no avail. There are answers for 1d arrays, multidimensional arrays in PHP or C#, etc, but not for C...or at least I can't find one.

Ideational answered 30/3, 2012 at 3:52 Comment(4)
+1 "so aesthetically repugnant so as to provoke unceasing projectile vomiting"...Execution
qsort reduces the problem of sorting 5d arrays to that of comparing two 4d arrays. As long as you know how to decide with of the two algorithms is "better" based on their corresponding 4d sub-arrays, you can sort your data using qsort. The doc I linked has a small example at the bottom, you should be able to adapt it to your needs.Negativism
Do you declare dataValue as dataValue[35][10][230][27][400] or are you saying that there are 35 possible values for algorithm, 10 for length, 230 for duration, etc?Preference
There are 10*230*27*400 possible values for each algo.Ideational
T
4

qsort in cstdlib would work. The array is Datatype ***data.

So first off, lets say that you want to sort the first index of the array. You'd have to write a comparator function to compare two Datatype****s. The comparator should return a value less than zero if ab.

int myComparator(void *a, void *b){
    Datatype ****c=(Datatype****)a; Datatype ****d=(Datatype****)b
    return algorithmRatingFunction(b)-algorithmRatingFunction(a);
}

This is pretty obviously inefficient because you have to reevaluate the algorithm for each dataset each comparison, but lets get to that in a sec. After you have the comparator, you can sort the array:

qsort(data,35,sizeOf(Datatype),myComparator);

That's it!

Then there's the issue of inefficiency... If algorithmRatingFunction takes a long time to complete (which I'm guessing it does), then you'd want to calculate all 35 algorithms once and only once. What you could do is calculate the scores beforehand:

int scores[35];
for(int n=0;n<35;n++)
    scores[n]=algorithmRatingFunction(data[n]);

Then create another ordered integer array:

int ordering[35];
for(int n=0;n<35;n++)
    ordering[n]=n;

So the state of "ordering" corresponds to the order of your data set. Then, you can create a new comparator:

int myFasterComparator(void *a, void *b){
    int c=*(int*)a; int d=*(int*)b
    return scores[c]-scores[d];
}

And call it on ordering:

qsort(ordering,35,sizeOf(int),myFasterComparator);

Then reconstruct the array using ordering. like so:

Datatype ****ordereddata[35];
for(int n=0;n<35;n++)
    ordereddata[n]=data[ordering[n]];

The same holds true for all other levels. Like dasblinkenlight posted, qsort reduces the problem of sorting 5d arrays to that of comparing two 4d arrays. So instead of sorting each 4d array you just have to compare two 3d arrays, etc.

Tappet answered 30/3, 2012 at 4:29 Comment(4)
Each of the 35 algos would have ~25 million scores, because each combination of vars would produce a unique result. Eg, algo[0] uses firstVar[0], 2ndVar[0], 3rdVar[0], and 4thVar[0]; then algo[1] uses firstVar[1], and [0] for 2nd,3rd and 4th vars, etc. Does your answer still hold?Ideational
Your idea about just doing the calcs only once made me think of the following: do the calcs for each of the 25 million scores, and sprintf each score/var permutation into a 0-padded string: 00000-000nn-000nn-000nn-000nn. Then I could just do 1 call to qsort with a strcmp comparator function. And I could sort on any of the "fields" by using different strcmp offsets (eg, strcmp(&stringA[6], stringB[6]), etc)Ideational
I think this is very wrong. qsort can only sort a one dimensional array of objects. The line qsort(data,35,sizeOf(Datatype),myComparator); will only sort the first 35 Datatype objects starting at data. Then, you'd be swapping Datatype objects.Preference
Ahhh I see what you mean now. I was thinking that each algorithm worked on the dataset of the 4d array. In that case, 0x69's answer sounds right to me, except I'd have an extra "score" variable in the struct which would be used in the comparator. If sorting 25 million values turns out to not be practical, you could just keep a top ten list and insert new values where they belong using a binsearch, then remove the last element.Tappet
P
2

I think you really need to refuse vomiting because of 5D effect. Make a struct instead:

typedef struct {
    int algorithm;
    int length;
    int duration;
    int plasticity;
    int fungibility;
    int dataValue;
} AlgorithmTestData;

And then define your test data 1D array:

AlgorithmTestData algoTestCases[NUMBER_OF_TEST_CASES];

or you can allocate it dynamically if you don't know the size of test cases with malloc.

Then you will qsort algoTestCases 1D array according to your comparision requirements.

Phenix answered 30/3, 2012 at 8:1 Comment(1)
This looks much better to me (i.e. records in a table) so long as the values are not contiguous (i.e. 0-34 for algo, 0-9 for length, etc) as you would then have a lot of redundancy. Sorting would certainly be easier.Preference

© 2022 - 2024 — McMap. All rights reserved.