My Understanding
From various sources, I have come to the understanding that there are four main techniques of string formatting/interpolation in Python 3 (3.6+ for f-strings):
- Formatting with
%
, which is similar to C'sprintf
- The
str.format()
method - Formatted string literals/f-strings
- Template strings from the standard library
string
module
My knowledge of usage mainly comes from Python String Formatting Best Practices (source A):
str.format()
was created as a better alternative to the%
-style, so the latter is now obsolete- However,
str.format()
is vulnerable to attacks if user-given format strings are not properly handled
- However,
- f-strings allow
str.format()
-like behavior only for string literals but are shorter to write and are actually somewhat-optimized syntactic sugar for concatenation - Template strings are safer than
str.format()
(demonstrated in the first source) and the other two methods (implied in the first source) when dealing with user input
I understand that the aforementioned vulnerability in str.format()
comes from the method being usable on any normal strings where the delimiting braces are part of the string data itself. Malicious user input containing brace-delimited replacement fields can be supplied to the method to access environment attributes. I believe this is unlike the other ways of formatting where the programmer is the only one that can supply variables to the pre-formatted string. For example, f-strings have similar syntax to str.format()
but, because f-strings are literals and the inserted values are evaluated separately through concatenation-like behavior, they are not vulnerable to the same attack (source B). Both %
-formatting and Template strings also seem to only be supplied variables for substitution by the programmer; the main difference pointed out is Template's more limited functionality.
My Confusion
I have seen a lot of emphasis on the vulnerability of str.format()
which leaves me with questions of what I should be wary of when using the other techniques. Source A describes Template strings as the safest of the above methods "due to their reduced complexity":
The more complex formatting mini-languages of the other string formatting techniques might introduce security vulnerabilities to your programs.
- Yes, it seems like f-strings are not vulnerable in the same way
str.format()
is, but are there known concerns about f-string security as is implied by source A? Is the concern more like risk mitigation for unknown exploits and unintended interactions?
I am not familiar with C and I don't plan on using the clunkier %
/printf
-style formatting, but I have heard that C's printf
had its own potential vulnerabilities. In addition, both sources A and B seem to imply a lack of security with this method. The top answer in Source B says,
String formatting may be dangerous when a format string depends on untrusted data. So, when using str.format() or %-formatting, it's important to use static format strings, or to sanitize untrusted parts before applying the formatter function.
- Do
%
-style strings have known security concerns? - Lastly, which methods should be used and how can user input-based attacks be prevented (e.g. filtering input with regex)?
- More specifically, are Template strings really the safer option? and Can f-strings be used just as easily and safely while granting more functionality?