How to check that two format strings are compatible?
Asked Answered
T

2

39

Examples:

"Something %d"        and "Something else %d"       // Compatible
"Something %d"        and "Something else %f"       // Not Compatible
"Something %d"        and "Something %d else %d"    // Not Compatible
"Something %d and %f" and "Something %2$f and %1$d" // Compatible

I figured there should be some C function for this, but I'm not getting any relevant search results. I mean the compiler is checking that the format string and the arguments match, so the code for checking this is already written. The only question is how I can call it.

I'm using Objective-C, so if there is an Objective-C specific solution that's fine too.

Triphammer answered 9/3, 2015 at 16:51 Comment(19)
You might be able to bend NS_FORMAT_FUNCTION to your will. Check this SO answer, as well as the Clang docs for __format__.Leatherback
Or look here for GNU: gnu.org/software/libc/manual/html_node/…Gussiegussman
parse_printf_format looks cool. How do I import it?Triphammer
It's in glibc. Looks like you only need to include "printf.h". It's not easy to find man pages for it..Gussiegussman
Nope, printf.h doesn't have it, I'm afraid.Triphammer
This code is working: ideone.com/f6AcGMGussiegussman
Found it here, and fixed minor stuff: freiburg.linux.de/projekte/manpages/man/parse_printf_format/…Gussiegussman
I think the issue is that the libc provided by Apple isn't glibc, but maybe they have something similar...Triphammer
Are the format strings "%x %lo %f" and "%d %lx %e" regarded as similar? Since each takes values in the sequence int, long, double, I think they probably are. And presumably "%8.3f" and "%+12.6f" are similar? That is, I'm guessing that your intent is to ensure that using either format string will consume the same list of other arguments. I'll also observe that there isn't a standard function that'll do the job, so any answer inevitably involves (quite a lot of) code — more than fits comfortably into an SO answer.Clubby
@ErikB: if you want working code, contact me (see my profile). I have 240-odd lines of C specifically related to comparing format strings (plus several hundred lines of test code/data), and then 500 lines of (pre-existing) format-parsing code, plus 300 more lines of test code/data, plus the test support harness (also pre-existing). As I said, it all adds up to way more code than fits sanely in an answer.Clubby
@Jonathan Leffler: can you open source this little package on github?Metaphrast
@chqrlie: Eventually, when I get my act together. But not at the moment.Clubby
@JonathanLeffler I was on vacation and didn't see your comments until now. To answer your question, what I'm interested in is that they take the same arguments in the same order, like in your example. To answer your other question, yes, I would be interested in working code. So I'll send you an email. Thanks.Triphammer
The $ in "Something %d and %f" and "Something %2$f and %1$d" is not part of the C standard. This should result in a 3rd answer: "not comparable".Djambi
@ErikB I am somewhat confused. What do you mean by "compatible"? Would it be possible for you to describe your problem in more detail? Is there a pattern you're trying to check between the two strings? If so you can use regular expressions to compare the two strings.Plainlaid
Check this link #5128297Unanswerable
You could try using regex, it's very simple with it.Weissberg
"I mean the compiler is checking that the format string and the arguments match" Just wanted to clear up this misconception about *printf. The compiler checks nothing here, the function does at runtime. It's why you can actually mess up here if you're not careful.Acrogen
@chux It may not be part of the C standard, but the implementation I rely on supports it and I would like the solution to support it as well.Triphammer
D
9

Checking if 2 printf() format strings are compatible is an exercise in format parsing.

C, at least, has no standard run-time compare function such as:

int format_cmp(const char *f1, const char *f2); // Does not exist

Formats like "%d %f" and "%i %e" are obviously compatible in that both expect an int and float/double. Note: float are promoted to double as short and signed char are promoted to int.

Formats "%*.*f" and "%i %d %e" are compatible, but not obvious: both expect an int,int and float/double.

Formats "%hhd" and "%d" both expect an int, even though the first will have it values cast to signed char before printing.

Formats "%d" and "%u" are not compatible. Even though many systems will behaved as hoped. Note: Typically char will promote to int.

Formats "%d" and "%ld" are not strictly compatible. On a 32-bit system there are equivalent, but not in general. Of course code can be altered to accommodate this. OTOH "%lf" and "%f" are compatible due to the usual argument promotions of float to double.

Formats "%lu" and "%zu" may be compatible, but that depends on the implementation of unsigned long and size_t. Additions to code could allow this or related equivalences.

Some combinations of modifiers and specifiers are not defined like "%zp". The following does not dis-allow such esoteric combinations - but does compare them.

Modifiers like "$" are extensions to standard C and are not implemented in the following.

The compatibility test for printf() differs from scanf().

#include <ctype.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>

typedef enum {
  type_none,
  type_int,
  type_unsigned,
  type_float,
  type_charpointer,
  type_voidpointer,
  type_intpointer,
  type_unknown,
  type_type_N = 0xFFFFFF
} type_type;

typedef struct {
  const char *format;
  int int_queue;
  type_type type;
} format_T;

static void format_init(format_T *state, const char *format);
static type_type format_get(format_T *state);
static void format_next(format_T *state);

void format_init(format_T *state, const char *format) {
  state->format = format;
  state->int_queue = 0;
  state->type = type_none;
  format_next(state);
}

type_type format_get(format_T *state) {
  if (state->int_queue > 0) {
    return type_int;
  }
  return state->type;
}

const char *seek_flag(const char *format) {
  while (strchr("-+ #0", *format) != NULL)
    format++;
  return format;
}

const char *seek_width(const char *format, int *int_queue) {
  *int_queue = 0;
  if (*format == '*') {
    format++;
    (*int_queue)++;
  } else {
    while (isdigit((unsigned char ) *format))
      format++;
  }
  if (*format == '.') {
    if (*format == '*') {
      format++;
      (*int_queue)++;
    } else {
      while (isdigit((unsigned char ) *format))
        format++;
    }
  }
  return format;
}

const char *seek_mod(const char *format, int *mod) {
  *mod = 0;
  if (format[0] == 'h' && format[1] == 'h') {
    format += 2;
  } else if (format[0] == 'l' && format[1] == 'l') {
    *mod = ('l' << CHAR_BIT) + 'l';
    format += 2;
  } else if (strchr("ljztL", *format)) {
    *mod = *format;
    format++;
  } else if (strchr("h", *format)) {
    format++;
  }
  return format;
}

const char *seek_specifier(const char *format, int mod, type_type *type) {
  if (strchr("di", *format)) {
    *type = type_int;
    format++;
  } else if (strchr("ouxX", *format)) {
    *type = type_unsigned;
    format++;
  } else if (strchr("fFeEgGaA", *format)) {
    if (mod == 'l') mod = 0;
    *type = type_float;
    format++;
  } else if (strchr("c", *format)) {
    *type = type_int;
    format++;
  } else if (strchr("s", *format)) {
    *type = type_charpointer;
    format++;
  } else if (strchr("p", *format)) {
    *type = type_voidpointer;
    format++;
  } else if (strchr("n", *format)) {
    *type = type_intpointer;
    format++;
  } else {
    *type = type_unknown;
    exit(1);
  }
  *type |= mod << CHAR_BIT; // Bring in modifier
  return format;
}

void format_next(format_T *state) {
  if (state->int_queue > 0) {
    state->int_queue--;
    return;
  }
  while (*state->format) {
    if (state->format[0] == '%') {
      state->format++;
      if (state->format[0] == '%') {
        state->format++;
        continue;
      }
      state->format = seek_flag(state->format);
      state->format = seek_width(state->format, &state->int_queue);
      int mod;
      state->format = seek_mod(state->format, &mod);
      state->format = seek_specifier(state->format, mod, &state->type);
      return;
    } else {
      state->format++;
    }
  }
  state->type = type_none;
}

// 0 Compatible
// 1 Not Compatible
// 2 Not Comparable
int format_cmp(const char *f1, const char *f2) {
  format_T state1;
  format_init(&state1, f1);
  format_T state2;
  format_init(&state2, f2);
  while (format_get(&state1) == format_get(&state2)) {
    if (format_get(&state1) == type_none)
      return 0;
    if (format_get(&state1) == type_unknown)
      return 2;
    format_next(&state1);
    format_next(&state2);
  }
  if (format_get(&state1) == type_unknown)
    return 2;
  if (format_get(&state2) == type_unknown)
    return 2;
  return 1;
}

Note: only minimal testing done. Lots of additional considerations could be added.

Known shortcomings: hh,h,l,ll,j,z,t modifiers with n. l with s,c.

[Edit]

OP comments about security concerns. This changes the nature of the post and the compare from an equality one to a security one. I'd imagine that one of the patterns (A) would be a reference pattern and the next (B) would be the test. The test would be "is B at least as secure as A?". Example A = "%.20s" and B1 = "%.19s", B2 = "%.20s", B3 = "%.21s". B1 and B2 both pass the security test as they do not extract more the 20 char. B3 is a problem as it goes pass the reference limit of 20 char. Further any non-width qualified with %s %[ %c is a security problem - in the reference or test pattern. This answer's code does not address this issue.

As mentioned, code does not yet handle modifiers with "%n".

[2018 edit]

Concerning "Formats "%d" and "%u" are not compatible.": This is for values to be printed in general. For values in the [0..INT_MAX] range, either format may work per C11dr §6.5.2.2 6.

Djambi answered 12/5, 2015 at 20:44 Comment(3)
I really like this answer. If op's ok with it, you deserved that bounty.Anastasius
I haven't had time to read, run or test the code, but it seems like you have understood the question. My main concern is security. I found that we were getting format strings from a configuration file that could be downloaded from the internet. Meaning that an attacker could inject a format string. I fixed that with an implementation specific to our case, but I figured there should be a general implementation for checking format string compatibility. While your implementation might work fine I don't really feel comfortable using it as I want something well tested and battle proven.Triphammer
@Erik B The security concerns, IMO, importantly change the question's focus. I've added some to my answer to address that issue, but security is really a new question. Maybe add a new post with details of your security concerns - especially in the areas of stings and "%n". IAC, I do not think you will find a ready made implementation.Djambi
P
-1

My understanding of what you want, is that, you basically want a method which can look at two strings and detect if they both have the same types of values in them. Or something a long those lines.... If so, then try this (or something along the lines of this):

-(int)checkCompatible:(NSString *)string_1 :(NSString *)string_2 {

    // Separate the string into single elements.
    NSArray *stringArray_1 = [string_1 componentsSeparatedByString:@" "];
    NSArray *stringArray_2 = [string_2 componentsSeparatedByString:@" "];

    // Store only the numbers for comparison in a new array.
    NSMutableArray *numbers_1 = [[NSMutableArray alloc] init];
    NSMutableArray *numbers_2 = [[NSMutableArray alloc] init];

    // Make sure the for loop below, runs for the appropriate
    // number of cycles depending on which array is bigger.
    int loopMax = 0;

    if ([stringArray_1 count] > [stringArray_2 count]) {
        loopMax = (int)[stringArray_1 count];
    } 

    else {
        loopMax = (int)[stringArray_2 count];
    }

    // Now go through the stringArray's and store only the 
    // numbers in the mutable array's. This will be used 
    // during the comparison stage.
    for (int loop = 0; loop < loopMax; loop++) {

        NSCharacterSet *notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];

        if (loop < [stringArray_1 count]) {

            if ([[stringArray_1 objectAtindex:loop] rangeOfCharacterFromSet:notDigits].location == NSNotFound) {
                // String consists only of the digits 0 through 9.
                [numbers_1 addObject:[stringArray_1 objectAtindex:loop]];
            }
        }

        if (loop < [stringArray_2 count]) {

            if ([[stringArray_2 objectAtindex:loop] rangeOfCharacterFromSet:notDigits].location == NSNotFound) {
                // String consists only of the digits 0 through 9.
                [numbers_2 addObject:[stringArray_2 objectAtindex:loop]];
            }
        }
    }

    // Now look through the mutable array's
    // and perform the type comparison,.

    if ([numbers_1 count] != [numbers_2 count]) {

        // One of the two strings has more numbers 
        // than the other, so they are NOT compatible.
        return 1;
    }

    else {

        // Both string have the same number of  numbers
        // numbers so lets go through them to make 
        // sure the  numbers are of the same type.
        for (int loop = 0; loop < [numbers_1 count]; loop++) {

            // Check to see if the number in the current array index
            // is a float or an integer. All the numbers in the array have
            // to be the SAME type, in order for the strings to be compatible.
            BOOL check_float_1 = [[NSScanner scannerWithString:[numbers_1 objectAtindex:loop]] scanFloat:nil];
            BOOL check_int_1 = [[NSScanner scannerWithString:[numbers_1 objectAtindex:loop]] scanInt:nil];
            BOOL check_float_2 = [[NSScanner scannerWithString:[numbers_2 objectAtindex:loop]] scanFloat:nil];
            BOOL check_int_2 = [[NSScanner scannerWithString:[numbers_2 objectAtindex:loop]] scanInt:nil];

            if (check_float_1 == YES) {

                if (check_float_2 == NO) {
                    return 1;
                }
            }

            else if (check_int_1 == YES) {

                if (check_int_2 == NO) {
                    return 1;
                }
            }

            else {
                // Error of some sort......
                return 1;
            }
        }

        // All the numbers in the strings are of the same
        // type (otherwise we would NOT have reached
        // this point). Therefore the strings are compatible.
        return 0;
      }
}
Pronominal answered 7/5, 2015 at 9:24 Comment(5)
The first problem is splitting the strings by spaces. Why?Cuculiform
@Cuculiform I am not splitting them by spaces. What I am doing there, is getting a copy of all the individual strings in that string and storing them in separate array elements. So lets say the string was "hello 123", then after componentsSeparatedByString, it would be stored in the array as "hello" [0] and "123" [1].Pronominal
How about "Something%d" and "Something %d"? I guess they are considered as "compatible" based on the question.Chuppah
This does not answer my question at all."hello 123" is not a format string. I don't think you understood the question.Triphammer
@ErikB Oh sorry. Maybe you could do a better job of explaining your question for us simpletons.Pronominal

© 2022 - 2024 — McMap. All rights reserved.