Which are safe methods and practices for string formatting with user input in Python 3?
Asked Answered
P

1

7

My Understanding

From various sources, I have come to the understanding that there are four main techniques of string formatting/interpolation in Python 3 (3.6+ for f-strings):

  1. Formatting with %, which is similar to C's printf
  2. The str.format() method
  3. Formatted string literals/f-strings
  4. Template strings from the standard library string module

My knowledge of usage mainly comes from Python String Formatting Best Practices (source A):

  • str.format() was created as a better alternative to the %-style, so the latter is now obsolete
  • f-strings allow str.format()-like behavior only for string literals but are shorter to write and are actually somewhat-optimized syntactic sugar for concatenation
  • Template strings are safer than str.format() (demonstrated in the first source) and the other two methods (implied in the first source) when dealing with user input

I understand that the aforementioned vulnerability in str.format() comes from the method being usable on any normal strings where the delimiting braces are part of the string data itself. Malicious user input containing brace-delimited replacement fields can be supplied to the method to access environment attributes. I believe this is unlike the other ways of formatting where the programmer is the only one that can supply variables to the pre-formatted string. For example, f-strings have similar syntax to str.format() but, because f-strings are literals and the inserted values are evaluated separately through concatenation-like behavior, they are not vulnerable to the same attack (source B). Both %-formatting and Template strings also seem to only be supplied variables for substitution by the programmer; the main difference pointed out is Template's more limited functionality.

My Confusion

I have seen a lot of emphasis on the vulnerability of str.format() which leaves me with questions of what I should be wary of when using the other techniques. Source A describes Template strings as the safest of the above methods "due to their reduced complexity":

The more complex formatting mini-languages of the other string formatting techniques might introduce security vulnerabilities to your programs.

  1. Yes, it seems like f-strings are not vulnerable in the same way str.format() is, but are there known concerns about f-string security as is implied by source A? Is the concern more like risk mitigation for unknown exploits and unintended interactions?

I am not familiar with C and I don't plan on using the clunkier %/printf-style formatting, but I have heard that C's printf had its own potential vulnerabilities. In addition, both sources A and B seem to imply a lack of security with this method. The top answer in Source B says,

String formatting may be dangerous when a format string depends on untrusted data. So, when using str.format() or %-formatting, it's important to use static format strings, or to sanitize untrusted parts before applying the formatter function.

  1. Do %-style strings have known security concerns?
  2. Lastly, which methods should be used and how can user input-based attacks be prevented (e.g. filtering input with regex)?
    • More specifically, are Template strings really the safer option? and Can f-strings be used just as easily and safely while granting more functionality?
Pare answered 18/1, 2022 at 6:38 Comment(4)
Let's assume this question is limited to composing the string, not what use gets made out of it later, for example using as CSS or HTML in a web page, SQL query, etc... Nice question, though I wonder if it's been asked before.Womanhood
Here's one, more limited than this question and quite old.Womanhood
This is a really good questionn, but the only place I would know to find your answers would be the official python docs maybe you should look thereWaxwork
What I'd like to consider is: in what situation will you give your users the opportunity to supply a string including Python formatting syntax? "Please type in your profile description here, and oh yeah, you can use Python formatting language in it." That seems like a questionable choice to begin with.Lapidify
A
1

It doesn't matter which format you choose, any format and library can have its own downsides and vulnerabilities. The bigger questions you need to ask yourself is what is the risk factor and the scenario you are facing with, and what are you going to do about it. First ask yourself: will there be a scenario where a user or an external entity of some kind (for example - an external system) sends you a format string? If the answer is no, there is no risk. If the answer is yes, you need to see whether this is needed or not. If not - remove it to eliminate the risk. If you need it - you can perform whitelist-based input validation and exclude all format-specific special characters from the list of permitted characters, in order to eliminate the risk. For example, no format string can pass the ^[a-zA-Z0-9\s]*$ generic regular expression.

So the bottom line is: it doesn't matter which format string type you use, what's really important is what do you do with it and how can you reduce and eliminate the risk of it being tampered.

Acquiescence answered 18/1, 2022 at 12:53 Comment(1)
Thank you for the guidance. I see that I was maybe looking too closely at the details of each format. The general mindset you provided makes the choices seem less overwhelming. I guess I did not specify that I was also somewhat curious of the ways each format worked underneath in comparison that make certain security concerns applicable. The sources had me worried that I missed something. The lack of relevant results in a search engine tells me there are likely no major concerns other than those of str.format(). As TERMINATOR commented, I should probably check the docs for more specific info.Pare

© 2022 - 2024 — McMap. All rights reserved.