Can I use memcmp along with qsort?
Asked Answered
L

3

6

I am making C dynamic array library, kind of. Note that I'm doing it for fun in my free time, so please do not recommend million of existing libraries.

I started implementing sorting. The array is of arbitrary element size, defined as struct:

typedef struct {
  //[PRIVATE] Pointer to array data
  void *array;
  //[READONLY] How many elements are in array
  size_t length;
  //[PRIVATE] How many elements can further fit in array (allocated memory)
  size_t size;
  //[PRIVATE] Bytes per element
  size_t elm_size;
} Array;

I originally prepared this to start with the sort function:

/** sorts the array using provided comparator method
 * if metod not provided, memcmp is used
 * Comparator signature
 *  int my_comparator ( const void * ptr1, const void * ptr2, size_t type_size );
**/
void array_sort(Array* a, int(*comparator)(const void*, const void*, size_t)) {
    if(comparator == NULL)
        comparator = &memcmp;
    // Sorting algorithm should follow
}

However I learned about qsort:

void qsort (void* base, size_t num, size_t size, int (*compar)(const void*,const void*));

Apparently, I could just pass my internal array to qsort. I could just call that:

qsort (a->array, a->length, a->elm_size, comparator_callback);

But there's a catch - qsort's comparator signature reads as:

int (*compar)(const void*,const void*)

While memcmp's signature is:

int memcmp ( const void * ptr1, const void * ptr2, size_t type_size );

The element size is missing in qsort's callback, meaning I can no longer have a generic comparator function when NULL is passed as callback. I could manually generate comparators up to X bytes of element size, but that sounds ugly.

Can I use qsort (or other sorting built-in) along with memcpy? Or do I have to chose between built-in comparator and built-in sorting function?

Lifesaving answered 28/11, 2016 at 23:36 Comment(12)
"so please do not recommend million of existing libraries." I laughedHelium
The pointers passed to the compare function are going to be Array pointers. You can cast them to Array then use the length member of that structure to determine how many bytes to compare.Quandary
Isn't the element size from qsort meant to be your array's elm_size?Rootstock
"But there's a catch - qsort's comparator signature reads as:..." that is because you pass the comparison function to the last parameter, not an array.Romanist
@RadLexus Yeah sure... Maybe it was unclear what I ask. My problem is that I cannot pass that size to memcpy. The default comparator function needs to know what is an element size - exactly because it receives two array pointers and doesn't know their size.Radford
The comparator it is not responsible for swapping anything. It knows their type because you wrote the comparison function for the specific use. The swapping is done by qsort. It is qsort that does not know their type (only the size), hence the void* arguments. There is no general implementation of the comparator (which is why it is an argument) - you might need to sort on a variety of struct fields, if the first choices are equal.Romanist
@WeatherVane The comparator needs to know where to stop reading bytes from pointers it received. That's why memcmp requires third argument - size. qsort also requires size of element but doesn't pass it to comparator. That's my problem here.Radford
It knows their size as well as their type because you wrote the function for the specific use - that is the point of qsort. Or, you could pass an array of struct to qsort, one of whose members gives the size of another of its members (say a dynamic array).Romanist
memcmp as a comparator only makes sense when you have an array of fixed-size character strings. Not enough motivation to shoehorn it into qairt.Misdate
Does sorting with memcmp even make sense? I can't think of a time I've wanted to sort an array of structs by their binary interpretation. Wouldn't it make much more sense to have the comparator passed into sort by the consumer of the function? They consumer could write the comparator with the correct knowledge of how to sort, what the element sizes are, etc.Mortimer
@AlexanderMomchliov I must admit that I didn't realize how useless is memcpy for anything but chars and fixed char lists.Radford
@TomášZato Yeeeeep. I think taking the comparator as a parameter to sort would be the best bet. Perhaps offer some default versions for sorting common datatypes like int, char, strings, etc.Mortimer
I
1

A non-thread-safe way is use private global variable to pass the size.

static size_t compareSize = 0;

int defaultComparator(const void *p1, const void *p2) {
  return memcmp(p1, p2, compareSize);
}

void array_sort(Array* a, int(*comparator)(const void*, const void*, size_t)) {
    if(comparator == NULL) {
      compareSize = a->elm_size;
      comparator = &defaultComparator;
    }
    // Sorting algorithm should follow
}

You can make it thread-safe by make compareSize thread-local variable (__thread)

Ibnrushd answered 28/11, 2016 at 23:53 Comment(4)
I thought of that, but that's a no-no for me, even though I don't think I'll use this with any multithreading. If that was the only option, I'd rather copy paste and edit qsort from GNU C library.Radford
Why? Global variable on C is sometimes unavoidable. By making it internal linkage, I don't see any issue with it.Ibnrushd
I'm not so experienced with C, I didn't know that. Could you add some detail on how to use the __thread thing? Is it portable?Radford
__thread is an GCC extension and thread_local is in C11. You can read more info here.Ibnrushd
G
4

C11 provides you with an (admittedly optional) qsort_s function, which is intended to deal with this specific situation. It allows you to pass-through a user-provided void * value - a context pointer - from the calling code to the comparator function. The comparator callback in this case has the following signature

int (*compar)(const void *x, const void *y, void *context)

In the simplest case you can pass a pointer to the size value as context

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdlib.h>
...

int comparator_callback(const void *x, const void *y, void *context)
{
  size_t elm_size = *(const size_t *) context;
  return memcmp(x, y, elm_size);
}

...
qsort_s(a->array, a->length, a->elm_size, comparator_callback, &a->elm_size);

Or it might make sense to pass a pointer to your entire array object as context.

Some *nix-based implementations have been providing a similar qsort_r function for a while, although it is non-standard.

Galleass answered 28/11, 2016 at 23:59 Comment(1)
I'm using the library in homework assignments which require C99, but I still upvoted this because it's useful in general.Radford
I
1

A non-thread-safe way is use private global variable to pass the size.

static size_t compareSize = 0;

int defaultComparator(const void *p1, const void *p2) {
  return memcmp(p1, p2, compareSize);
}

void array_sort(Array* a, int(*comparator)(const void*, const void*, size_t)) {
    if(comparator == NULL) {
      compareSize = a->elm_size;
      comparator = &defaultComparator;
    }
    // Sorting algorithm should follow
}

You can make it thread-safe by make compareSize thread-local variable (__thread)

Ibnrushd answered 28/11, 2016 at 23:53 Comment(4)
I thought of that, but that's a no-no for me, even though I don't think I'll use this with any multithreading. If that was the only option, I'd rather copy paste and edit qsort from GNU C library.Radford
Why? Global variable on C is sometimes unavoidable. By making it internal linkage, I don't see any issue with it.Ibnrushd
I'm not so experienced with C, I didn't know that. Could you add some detail on how to use the __thread thing? Is it portable?Radford
__thread is an GCC extension and thread_local is in C11. You can read more info here.Ibnrushd
C
1

The qsort() API is a legacy of simpler times. There should be an extra "environment" pointer passed unaltered from the qsort() call to each comparison. That would allow you to pass the object size and any other necessary context in a thread safe manner.

But it's not there. @BryanChen's method is the only reasonable one.

The main reason I'm writing this answer is to point out that there are very few cases where memcmp will do something useful. There are not many kinds of objects where comparison by lexicographic order of constituent unsigned chars makes any sense.

Certainly comparing structs that way is dangerous because padding byte values are unspecified. Even the equality part of the comparison can fail. In other words,

struct foo { int i; };

void bar(void) { 
  struct foo a, b;
  a.i = b.i = 0;
  if (memcmp(&a, &b, sizeof a) == 0) printf("equal!");
}

may - by the C standard - print nothing!

Another example: for something as simple as unsigned ints, you'll get different sort orders for big- vs. little-endian storage order.

unsigned a = 0x0102;
unsigned b = 0x0201;
printf("%s", memcmp(&a, &b, sizeof a) < 0 ? "Less!" : "More!");

will print Less or More depending on the machine where it's running.

Indeed the only object type I can imagine that makes sense to compare with memcmp is equal-sized blocks of unsigned bytes. This isn't a very common use case for sorting.

In all, a library that offers memcmp as a comparison function is doomed to be error prone. Someone will try to use it as a substitute for a specialized comparison that's really the only way to obtain the desired result.

Cf answered 29/11, 2016 at 0:20 Comment(2)
While it is true that specific ordering might be platform-dependent, it still does not mean that the specific ordering matters in OP's context. Maybe the OP only cares about grouping of equivalent elements, not the relative positions of elements. In essentially the same way element ordering inside of C++ std::unordered_set might (and will) differ from one platform/implementation to another, which stuill does not limit std::unordered_set's usability for its intended purpose.Galleass
@AnT He said he's building a dynamic array library. I think it's reasonable to assume the types it contains are arbitrary. Comparing arbitrary types byte-wise lexicographically almost never makes sense.Cf

© 2022 - 2024 — McMap. All rights reserved.