Writing style to prevent string concatenation in a list of strings
Asked Answered
A

4

7

Suppose I have a list/tuple of strings,

COLOURS = [
    "White",
    "Black",
    "Red"
    "Green",
    "Blue"
]

for c in COLOURS:
    # rest of the code

Sometimes I forget placing a comma after each entry in the list ("Red" in the above snippet). This results in one "RedGreen" instead of two separate "Red" and "Green" list items.

Since this is valid Python, no IDE/text editor shows a warning/error. The incorrect value comes to the limelight only during testing.

What writing style or code structure should I use to prevent this?

Airspace answered 1/9, 2022 at 16:49 Comment(17)
Looks like there isn't much of a solution #71310295Firebrick
Parenthesizing each string would turn any missing comma into an error - but that just moves the problem from remembering commas to remembering parentheses.Uremia
You could go about it another way; might seem like overkill but it could have its benefits - you could have a separate file with the list of items separated by newlines, then read in that file as a list. Or, define all the items in your code in a multiline string, then split by lines.Whinstone
I am not sure how to check for it, but this seems like the kind of thing that could be detected by a static linter (search for two string literals with no operators between them?). I wonder if you could propose this to PEP 8 to include this as a warning?Firebrick
@Firebrick "Two or more string literals (i.e. the ones enclosed between quotes) next to each other are automatically concatenated." - Python docs. This can't be considered a warning since it's a language feature.Airspace
@Airspace I could be completely wrong, but because PEP8 is a “style guide” I believe it will flag many things that you and I can agree are language features as warnings. This would be a useful warning for those running a static linter because it is (I assume) a very rarely used language feature and more often likely an unintended bug.Firebrick
@Firebrick Pylint will find this problem, but only if the strings are on the same line. The rule is "implicit-str-concat (W1404)". Edit: Oh, actually, it has a flag check-str-concat-over-line-jumps that will find it even if the strings aren't on the same line.Noranorah
I found this old PEP feature request to disallow this, which failed. I wonder if we could ask them to reconsider with the stipulation that it should raise a warning but still be allowed? peps.python.org/pep-3126Firebrick
@Firebrick Good find! Implicit string concatenation goes against "explicit is better than implicit" philosophy.Airspace
I have also confirmed that PEP8 still does not flag this as an error, so there was no approved proposal after the one mentioned above.Firebrick
I personally have gotten in the habit of always putting a trailing commas in multi-line containers and argument lists. I know it's a bit of a non-answer, but this is easily solved by just never forgetting, lol.Intercessory
If you are concerned about implicit string joining, don't use multiple string literals: use a single string that must be split. E.g., COLORS = "White Black Red Green Blue".split(). For any list whose elements you were explicitly enumerating, the overhead should be minimal. (That is, I'm assuming you weren't defining a list of hundreds of elements or more.)Rhizogenic
Also, if you have tests that catch this, good! It means this is just an inconvenience, not a "real" problem.Rhizogenic
This is an opinion-based question, so it's off-topic. But my recommendation is to write code more slowly or with more caffeine.Darter
This question is not a duplicate of the linked question. I am aware, as evident from my question, of "why trailing commas are allowed..". Also, I don't think this is opinion-based. After a period of time certain best practices/coding styles are developed by the community that helps the developer write less error-prone code. I was looking for such a code style and apparently as per most of the comments there's no such "style" atm. @wjandrea's comment about the two Pylint flags is probably the closest answer to this question.Airspace
@Airspace I agree it's not opinion-based, but mostly because of the false assumption "no IDE/text editor shows a warning/error", as I addressed in my answer. The wording, "What writing style or code structure should I use to prevent this?" is inherently opinion-based because of "should". If you rephrased it as "Is there a writing style or code structure I could use to prevent this?", that'd be better. But maybe this is semantics, IDK.Noranorah
@Noranorah Thanks for editing the question and making my intention clear.Airspace
N
4

You're incorrect that "no IDE/text editor shows a warning/error". Pylint can identify this problem using rule implicit-str-concat (W1404) with flag check-str-concat-over-line-jumps. (And for that matter, there are lots of things that are valid Python that a linter will warn you about, like bare except: for example.)

Personally, I'm using VSCode, so I enabled Pylint via the Python extension (python.linting.pylintEnabled) and set up a pylintrc like this:

[tool.pylint]
check-str-concat-over-line-jumps = yes

Now VSCode gives this warning for your list:

Implicit string concatenation found in list  pylint(implicit-str-concat)  [Ln 4, Col 1]


Lastly, there are probably other linters that can find the same problem, but Pylint is the first one I found.

Noranorah answered 2/9, 2022 at 21:28 Comment(0)
F
1

Using what @wjandrea found, here is a possible solution. It is not pretty, but it can correctly detect that this code has implicit string concatenation. I had it just print out the warning instead of crashing, but if you want it could be modified to throw an error.

You could also just run pylint on your file, but I assumed you were looking for a "programmatic" way to detect when this, and only this, occurs.

import pylint.lint
import sys

def check_for_implicit_str_concat():
    pylint.lint.reporters.json_reporter.JSONReporter.display_messages = lambda self, layout: None

    options = [
        sys.argv[0], 
        '--output-format=pylint.reporters.json_reporter.JSONReporter',
        '--check-str-concat-over-line-jumps=y'
    ]
    results = pylint.lint.Run(options, do_exit=False)
    for message in results.linter.reporter.messages:
        if message['message-id'] == 'W1403':
            print(message)

if __name__ == "__main__":
    check_for_implicit_str_concat()
    COLOURS = [
        "White",
        "Black",
        "Red"
        "Green",
        "Blue"
    ]

Output:

{
  "type": "warning",
  "module": "test",
  "obj": "",
  "line": 22,
  "column": 0,
  "path": "test.py",
  "symbol": "implicit-str-concat-in-sequence",
  "message": "Implicit string concatenation found in list",
  "message-id": "W1403"
}
Firebrick answered 1/9, 2022 at 18:10 Comment(1)
I was looking for a coding style and not a "programmatic" way. Even though I was asking for a coding style, enabling the two Pylint flags in my IDE should do what I want. Thanks.Airspace
L
1

I'd go with an approach like this, which may even increase the maintainability and extensibility of the colours:

  • Use enums (or dictionaries) for colours, that would allow you to use them in the following way: Colours.RED, Colours.BLUE. Probably the best solution out there, but it really depends on the context of your application.

  • Create a very simple Colour class that would contain a string field of a colour and a constructor that initialises it. This way, the following code looks like this:

    COLOURS = [Colour("white"), Colour("green")]
    

    The downside of it are probably over-architecting your code to a certain extent and having to treat colour objects now slightly different (perhaps with c.get_colour() or c.color. Maybe it's not a downside at all, depending on the scale of your application.

  • Another "solution" would be to go in a functional style, creating a function that is very redundant, taking the colour as a string and just returning it back. I'm not a fan of this solution as it may not give you any benefit later.

Most of the time in production code, if you have bunch of constants like this in a list (except the cases where you're defining them specifically for enumeration of some sort), not having them enumerated or anything similar - you're most likely doing something wrong, so the solution would be to avoid that.

Larcenous answered 1/9, 2022 at 19:2 Comment(0)
W
1

Use Black formatter and Pylint together and you're good to go.

If you have your list like: (without ending comma)

COLOURS = [
    "White",
    "Black",
    "Red"
    "Green",
    "Blue"
]

Then after formatting it'll become:

COLOURS = ["White", "Black", "Red" "Green", "Blue"]

They're now in the same line...So Pylint complains about it: Implicit string concatenation found in list.

If you have your same list but with ending comma:

COLOURS = [
    "White",
    "Black",
    "Red"
    "Green",
    "Blue",
]

It'll become:

COLOURS = [
    "White",
    "Black",
    "Red" "Green",
    "Blue",
]

Again as you see Black forces these to be in one line in either case. So you will get Pylint's warning.

Worrisome answered 2/9, 2022 at 21:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.