Automatically detect identical consecutive std::string::find() calls
Asked Answered
A

1

12

During a code review, i found source code like this:

void f_odd(std::string &className, std::string &testName)
{
   if (className.find("::") != std::string::npos)
   {
     testName = className.substr(className.find("::") + 2);
     (void)className.erase(className.find("::"), std::string::npos);
   }
}

Within this function, std::string::find() is called three times with the same pattern( here "::").

This code can of course be refactored to

void f(std::string &className, std::string &testName)
{
   const size_t endOfClassNamePos = className.find("::");
   if (endOfClassNamePos != std::string::npos)
   {
     testName = className.substr(endOfClassNamePos + 2);
     (void)className.erase(endOfClassNamePos, std::string::npos);
   }
}

where find is called only once.

Question

Does anybody know a strategy to detecting such a pattern like this? I am having a huge code base, where i intend to spot this pattern. I plan to use a Windows or a Linux environment.

Potential Strategies

  1. Use/adapt a static code analysis tool, like cppcheck to detect these kind of oddities.
  2. Search within the code base with regular expression.
  3. Use/adapt clang-tidy for detection of this pattern.
  4. Write a custom checker in some language (e.g. Python) that detects these issues. In this case, the checking should be performed on pre-processed code.

No Go's

  • Manual review

Update 1

I decided to start with potential strategy 1). I plan to adapt cppcheck to catch this issue.

Cppcheck offers a possibility to write customized rules, based on PCRE regular expressions. For this, cppcheck has to be compiled with enabled PCRE support. Since the current test environment is Linux-based, the following commands can be used to download the latest version of cppcheck:

git clone https://github.com/danmar/cppcheck.git && cd cppcheck

After that, compile and install the tool as follows:

sudo make install HAVE_RULES=yes

Now the basic tool setup is done. In order to develop a cppcheck-rule, i prepared a simple test case (file: test.cpp), similar to the sample code in the first section of this article. This file contains three functions and the cppcheck-rule shall emit a warning on f_odd and f_odd1 about consecutive identical std::string::find calls.

test.cpp:

#include <string>
void f(std::string &className, std::string &testName)
{
   const size_t endOfClassNamePos = className.find("::");
   if (endOfClassNamePos != std::string::npos)
   {
     testName = className.substr(endOfClassNamePos + 2);
     (void)className.erase(endOfClassNamePos, std::string::npos);
   }
 }

 void f_odd(std::string &className, std::string &testName)
 {
    if (className.find("::") != std::string::npos)
    {
       testName = className.substr(className.find("::") + 2);
       (void)className.erase(className.find("::"), std::string::npos);
    }
 }

 #define A "::"
 #define B "::"
 #define C "::"
 void f_odd1(std::string &className, std::string &testName)
 {
   if (className.find(A) != std::string::npos)
   {
      testName = className.substr(className.find(B) + 2);
      (void)className.erase(className.find(C), std::string::npos);
   }
 }

So far so good. Now cppcheck has to be tweaked to catch consecutive identical std::string::find calls. For this i have created a cppcheck_rule-file that contains a regular expression that matches consecutive identical std::string::find calls:

<?xml version="1.0"?>
<rule>
<tokenlist>normal</tokenlist>
<pattern><![CDATA[([a-zA-Z][a-zA-Z0-9]*)(\s*\.\s*find)(\s*\(\s*\"[ -~]*\"\s*\))[ -\{\n]*(\1\2\3)+[ -z\n]]]></pattern>
<message>
    <severity>style</severity>
    <summary>Found identical consecutive std::string::find calls.</summary>
</message>

This file can be used to extend cppcheck about a new check. Lets try:

cppcheck --rule-file=rules/rule.xml test/test.cpp

and the output is

Checking test/test.cpp...
[test/test.cpp:14]: (style) Found identical consecutive std::string::find calls.
[test/test.cpp:26]: (style) Found identical consecutive std::string::find calls.

Now, identical consecutive std::string::find calls can be detected in C/C++ codes. Does anybody know a better/more efficient or more clever solution?

References:


Acquire answered 27/4, 2016 at 12:10 Comment(4)
You could write your own clang-tidy check to detect this.Rosenarosenbaum
@Rosenarosenbaum Thank you clang-tidy is another potentia tool that could do the job. I'll update my potential solutions section.Acquire
Can you re-phrase your question such that it doesn't appear like a request for tool recommendations? These are explicitly off-topic on this site.Denticulation
I haven't checked out how cppcheck applies your rule, but you might want to create some test cases to make sure you don't get false positives: where the string is mutated between find calls (so the previous result can't be reused), where the calls happen in different functions, where different scopes have same-named strings being searched....Sacrilege
H
1

The main problem with such a tool is that a lexical analysis can only check if there is textual repetition. Taking your example, calling className.find("::") twice is a potential issue if the variable refers to the same string twice. But let me add one small change to your code: className = className.substr(className.find("::") + 2);. Suddenly the meaning of the next className.find has changed dramatically.

Can you find such changes? You need a full-blown compiler for that, and even then you have to be pessimistic. Sticking to your example, could className be changed via an iterator? It's not just direct manipulation you need to be aware of.

Is there no positive news? Well: existing compilers do have a similar mechanism. It's called Common Subexpression Elimination, and it works conceptually as you would want it to work in the example above. But that is also bad news in one way: If the situation is detectable, it's unimportant because it's already optimized out by the compiler!

Haldane answered 28/4, 2016 at 21:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.