How to find in my program a "const char* + int" expression
Asked Answered
S

9

13

I'm in a source code migration and the converter program did not convert concatenation of embedded strings with integers. Now I have lots of code with this kind of expressions:

f("some text" + i);

Since C/C++ will interpret this as an array subscript, f will receive "some text", or "ome text", or "me text"...

My source language converts the concatenation of an string with an int as an string concatenation. Now I need to go line by line through the source code and change, by hand, the previous expression to:

f("some text" + std::to_string(i));

The conversion program managed to convert local "String" variables to "std::string", resulting in expressions:

std::string some_str = ...;
int i = ...;

f(some_str + i);

Those were easy to fix because with such expressions the C++ compiler outputs an error.

Is there any tool to find automatically such expressions on source code?

Schlessel answered 12/7, 2014 at 13:10 Comment(5)
The way you have posed the question, you need a tool that can check the types of expression fed to "f". Others have suggested regexps, which can at best recognize tokens that hint at the types. If the regexp solutions are not good enough, then I have a possible solution.Alcheringa
@IraBaxter That sounds good! Maybe parsing the AST output of clang to find the operator + node with some const char [] children would be nice.Schlessel
I wonder if something along the lines of overloading global operator+ (const char*, int) and inducing a compiler error or warning inside the body of the overload would produce the desired result ?Thelen
Define the overload in a header file that is included everywhere, have it generate a compile time warning each time the overload is expanded and then just filter the output of the compiler ?Thelen
@Thelen the compiler won't allow you to overload a global operator with two native types. I've just tried it. Anyway I've found a solution and posted an answer. Thanks!Schlessel
S
2

I've found a very simple way to detect this issue. Regular expression nor a lint won't match more complex expressions like the following:

f("Hello " + g(i));

What I need is to somehow do type inference, so I'm letting the compiler to do it. Using an std::string instead of a literal string raises an error, so I wrote a simple source code converter to translate all the string literals to the wrapped std::string version, like this:

f(std::string("Hello ") + g(i));

Then, after recompiling the project, I'd see all the errors. The source code is on GitHub, in 48 lines of Python code:

https://gist.github.com/alejolp/3a700e1730e0328c68de

Schlessel answered 16/7, 2014 at 13:39 Comment(4)
Pragmatic and original! And it works, provided nobody overloaded operator+ to give meaning to the addition of a string with an integer.Incautious
@IwillnotexistIdonotexist Yes, since this is a migration fron a non-oop source code, the structure of the code is very basic. We don't have classes with virtual methods, nor any operator overloads.Schlessel
Good idea, but you have the same problem as nicky_zs: parsing strings right is difficult. Your code breaks with "\\", or with multiline strings, or with C++11 raw strings. I'd let the compiler worry about it, and go either for Alper's or my solution.Introduce
@Introduce It also break with implicit concatenation of literal strings: "aa" "bb", but I don't have any of those. I'm just using this script to detect the errors, then fixing them on the original source code. Also, a lint won't detect complex expressions like the first example, since a lint can't do type inference.Schlessel
I
8

Easy! Just replace all the + with -&:

find . -name '*.cpp' -print0 | xargs -0 sed -i '' 's/+/-\&/g'


When trying to compile your project you will see, between other errors, something like this:

foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
    return f(s -& i);
             ~ ^~~~

(I'm using clang, but other compilers should issue similar errors)


So you just have to filter the compiler output to keep only those errors:

clang++ foo.cpp 2>&1 | grep -F "error: 'const char *' and 'int *' are not pointers to compatible types"

And you get:

foo.cpp:9:16: error: 'const char *' and 'int *' are not pointers to compatible types
foo.cpp:18:10: error: 'const char *' and 'int *' are not pointers to compatible types
Introduce answered 15/7, 2014 at 23:34 Comment(2)
Good idea, but you will not be able to recover your code back, so better do it on a backup copy of the code base (if the size of the code base permits)Thelen
True! But I think 75k loc is quite manageable. In any case, if you wanted to do it in place, you could replace instead by -&*&, and then replace back the + signs.Introduce
S
7

You can try flint, an open-source lint program for C++ developed and used at Facebook. It has blacklisted token sequences feature (checkBlacklistedSequences). You can add your token sequence to the checkBlacklistedSequences function and flint will report them.

in checkBlacklistedSequences function, I added the sequence string_literal + number

BlacklistEntry([tk!"string_literal", tk!"+", tk!"number"],
               "string_literal + number problem!\n",
                true),

then compile and test

$ cat -n test.cpp
 1  #include <iostream>
 2  #include <string>
 3  
 4  using namespace std;
 5  
 6  void f(string str)
 7  {
 8      cout << str << endl;
 9  }
10  
11  int main(int argc, char *argv[])
12  {
13      f("Hello World" + 2);
14  
15      f("Hello World" + std::to_string(2));
16  
17      f("Hello World" + 2);
18  
19      return 0;
20  }

$ ./flint test.cpp 
test.cpp(13): Warning: string_literal + number problem!
test.cpp(17): Warning: string_literal + number problem!

flint has two versions (old version developed in C++ and new version in D language), I made my changes in D version.

Scenarist answered 14/7, 2014 at 13:12 Comment(3)
What about a loop variable? Like for (x...) f("Hello world" + x);Schlessel
Add one more sequence with tk!"identifier" instead of tk!"number" and it works. However, type of the identifier can be any type. There is no specific check for int.Scenarist
This looks very promosing. However I was going for the more general case of detecting every expression of operator + with const char* and int, therefore what I want is type inference. I am not using a lint in my project, but just cppcheck.Schlessel
F
3

I'm not familiar with a lot of tools which can do that, but I think grep can be helpful in some measure.

In the root directory of your source code, try:

grep -rn '".\+"\s*+\s*' .

, which can find out all the files which containt a line like "xxxxx" +, hope this can help you find all the lines you need.

If all the integers are constant, you can alter the grep experssion as:

grep -rn '".\+"\s*+\s*[0-9]*' .

And you can also include the ( before the string constant:

grep -rn '(".\+"\s*+\s*[0-9]*' .

This may be not the "correct" answer, but I hope this can help you.

Fatidic answered 12/7, 2014 at 13:22 Comment(9)
Running a grep on this project raises a los of false positives. Thanks!Schlessel
@vz0, of course, the best way to do this is to perform static syntax analysis using something like lint. However, using grep is just a way to find the point of problems most quickly.Fatidic
Check my accepted answer. A lint can't do type inference.Schlessel
@vz0, I see your answer but I don't understand that if your answer works, why would grep raise a lot of false positives, since grep works just like your find-and-replace and strings quoted by double-quotes are very easy to grep out?Fatidic
your solution does not work in the general case of using the operator + with a const char* and an int, or vice versa, int and const char*. The case of «"text" + i» is just an example of one of that general expressions. By "false positive" I mean all those valid cases where the concatenations of an string with another string variable, for example: "Hello " + n, where n is an std::string.Schlessel
@vz0, well the last grep statement in my answer can avoid the false positive "Hello" + "World". And also, I believe that your answer is more of less equivalent to my answer theoretically because replacing "Hello" with std::string("hello") is equivalent to finding all "Hello"s using grep.Fatidic
No, they are not, because with your solution I have to check manually, one be one, if the expression "str" + something is a valid expression, while with my solution I am letting the compiler decide and complain if I'm adding an integer to an string. You don't have to trust me, go and write a simple program with both approaches.Schlessel
@vz0, of course your solution makes the compiler to check whether the expression is valid and of course the compiler won't be wrong. I just mean that, if all the integers in "hello" + xx expression are constants, then using grep is equivalent to compiler checking and replace string literal with std::string is equivalent to greping A regular expression. Using grep is just a simplest way to obtain as much information as possible.Fatidic
@vz0, also, grep works without type inference, if the integers are not all constants, eg. "hello" + i, where i is an integer variable, then grep will be of no use.Fatidic
J
2

You may not need an external tool. Instead, you can take advantage of C++ one-user-defined-conversion rule. Basically, you need to change the argument of your f function from const char*/std::string to a type, that is implicitly convertible only from either a string literal (const char[size]) or an std::string instance (what you get when you add std::to_string in the expression).

#include <string>
#include <iostream>

struct string_proxy
{
    std::string value;

    string_proxy(const std::string& value) : value(value) {}

    string_proxy(std::string&& value) : value(std::move(value)) {}

    template <size_t size>
    string_proxy(const char (&str)[size]) : value(str) {}
};

void f(string_proxy proxy)
{
    std::cout << proxy.value << std::endl;
}

int main()
{
    f("this works"); // const char[size]
    f("this works too: " + std::to_string(10)); //  std::string
    f("compile error!" + 10); // const char*
    return 0;
}

Note that this is not going to work on MSVC, at least not in 2012 version; it's likely a bug, since there are no warning emitted either. It works perfectly fine in g++ and clang (you can quickly check it here).

Jairia answered 14/7, 2014 at 16:33 Comment(5)
This sounds promosing. However I have 75k lines of code with thousand functions...Schlessel
In all honesty, 75k lines is not that many. An alternative solution is to derive from std::string, make construction from const char* explicit and add array reference constructor. Both solution have an obvious benefit of being compile-type enforced.Jairia
It's for my pet project, don't have much free time.Schlessel
You will have to dedicate some time to either set up and maintain a lint tool (an ongoing process) or to fail-proof the code, as above.Jairia
You can at least partially automate the conversion. Just compile and pipe the errors to a file. Use the line numbers in the file to auto-convert your code using the scripting language of your choice (sed, awk, perl, php, python, etc.).Intercom
S
2

I've found a very simple way to detect this issue. Regular expression nor a lint won't match more complex expressions like the following:

f("Hello " + g(i));

What I need is to somehow do type inference, so I'm letting the compiler to do it. Using an std::string instead of a literal string raises an error, so I wrote a simple source code converter to translate all the string literals to the wrapped std::string version, like this:

f(std::string("Hello ") + g(i));

Then, after recompiling the project, I'd see all the errors. The source code is on GitHub, in 48 lines of Python code:

https://gist.github.com/alejolp/3a700e1730e0328c68de

Schlessel answered 16/7, 2014 at 13:39 Comment(4)
Pragmatic and original! And it works, provided nobody overloaded operator+ to give meaning to the addition of a string with an integer.Incautious
@IwillnotexistIdonotexist Yes, since this is a migration fron a non-oop source code, the structure of the code is very basic. We don't have classes with virtual methods, nor any operator overloads.Schlessel
Good idea, but you have the same problem as nicky_zs: parsing strings right is difficult. Your code breaks with "\\", or with multiline strings, or with C++11 raw strings. I'd let the compiler worry about it, and go either for Alper's or my solution.Introduce
@Introduce It also break with implicit concatenation of literal strings: "aa" "bb", but I don't have any of those. I'm just using this script to detect the errors, then fixing them on the original source code. Also, a lint won't detect complex expressions like the first example, since a lint can't do type inference.Schlessel
S
0

If your case is exactly as

"some text in quotations" + a_numeric_variable_or_constant

then Powergrep or similar programs will let you to scan all files for

("[^"]+")\s*\+\s*(\w+)

and replace with

\1 + std::to_string(\2)

This will bring the possible matches to you but i strongly recommend first preview what you are replacing. Because this will also replace the string variables.

Regular expressions cannot understand the semantics of your code so they cannot be sure that if they are integers. For that you need a program with a parser like CDT or static code analyzers. But unfortunately i do not know any that can do that. So to sum i hope regex helps :)

PS: For the worst case if the variables are not numeric then compiler will give you error because to_string function doesn't accept anything than numeric values. May be later then you can manually replace only them which i can only hope won't be more.

PS 2: Some may think that Powergrep is expensive. You can use trial for 15 day with full functionality.

Sheeree answered 12/7, 2014 at 13:49 Comment(1)
Running a grep on this project raises a los of false positives. Thanks!Schlessel
C
0

You can have a try at the Map-Reduce Clang plugin. The tool was developped at Google to do just this kind of refactoring, mixing strong type-checking and regexp.

(see video presentation here ).

Coupler answered 15/7, 2014 at 15:33 Comment(0)
L
0

You can use C++ typecasting operator & create a new class which can overload the operator + to your need. You can replace the int to new class "Integer" & perform the required overloading. This requires no changes or word replacing in the main function invocation.

class Integer{
    long  i;
    std::string formatted;
public:
     Integer(int i){i = i;}
     operator char*(){
        return (char*)formatted.c_str();}
     friend Integer operator +( char* input, Integer t);
};
Integer operator +( char* input, Integer integer) {
    integer.formatted = input + std::to_string(integer.i);
    return integer;
}
Integer i = ....
f("test" + i); //executes the overloaded operator
Luedtke answered 16/7, 2014 at 11:41 Comment(1)
I assumed that the "i" is defined in only one place?Luedtke
G
0

i'm assuming for function f(some_str + i); your definition should be like this

 void f(std::string value)
 {
    // do something.
 }

if you declare some other class like AdvString to implement Operator + for intergers. if your declare your function like this below code. it will work like this implementation f(some_str + i);

 void f(AdvString value)
 {
   // do something.
 }

sample implementation is here https://github.com/prasaathviki/advstring

Griceldagrid answered 21/7, 2014 at 5:25 Comment(1)
Thank you. As I stated earlier, the f function is just an example of one use case. In general I have many functions and many expressions in many contexts, so this solution would only work for one single case. If you use an auxiliary variavble std::string s = "hello" + i; your modification won't detect the issue.Schlessel

© 2022 - 2024 — McMap. All rights reserved.