Static code analysis is one of the most important components of secure software development. It detects errors and potential vulnerabilities early in the development process, when they're cheaper and easier to fix. It also enables developers to detect security issues and flaws that they aren't aware of.
This article provides examples to show why static analysis is important, rather than simply speculates on the idea of using analyzers for safety. So, let's take a practical look at how static analysis can make your code safer, more reliable, and neater.
We'll continue examining the TDengine project, which we've covered in three small notes on code refactoring:
TDengine is a database designed for IoT systems, where reliability and security are especially critical, which makes this project stand-out more. Checking the code using PVS-Studio static analyzer was especially interesting since it detects not only typos but also potential vulnerabilities.
TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, and Industrial IoT. It enables efficient, real-time data ingestion, processing, and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors.
The fourth part of the series is released much long after the project check. So, some code snippets may look different now, and some errors may already be fixed. However, it's not a big deal—this article is intended to highlight the value of static analysis as a practice, rather than find and fix as many errors as possible. Such one-time checks show the capabilities of PVS-Studio analyzer but they don't contribute meaningfully to long-term quality and reliability of the project. Static analysis should be used regularly. Introduce Static Analysis in the Process, Don't Just Search for Bugs with It.
Usually we only check the project code, discarding third-party libraries that it uses. However, this time I deliberately checked the entire codebase with external libraries. I'd like you to think over the following take:
From the user's point of view, it doesn't matter whether a bug or vulnerability originates in our own code or in the code of a library we use. If we use third-party code, we take responsibility for it. Errors and issues in third-party code become ours.
It is useful, and sometimes necessary, to perform static analysis of the third-party components that we use:
Dereferencing a null pointer is a very frequent error. In this regard, the TDengine project code is no exception.
The taosArrayGetLast
function can return NULL
:
void* taosArrayGetLast(const SArray* pArray) {
if (pArray->size == 0) {
terrno = TSDB_CODE_INVALID_PARA;
return NULL;
}
return TARRAY_GET_ELEM(pArray, pArray->size - 1);
}
The author of the following code attempted to handle this scenario, but failed.
static int32_t walInitWriteFile(SWal *pWal) {
int64_t fileFirstVer = -1;
....
SWalFileInfo *pRet = taosArrayGetLast(pWal->fileInfoSet);
if (pRet == NULL) {
fileFirstVer = pWal->vers.lastVer + 1;
}
fileFirstVer = pRet->firstVer;
....
}
Regardless of the pRet
pointer value, the pointer is still dereferenced. In addition, the value of the fileFirstVer
variable is always overwritten. So, the analyzer issues two warnings at once:
Perhaps the developer should have added else
to remedy the situation:
SWalFileInfo *pRet = taosArrayGetLast(pWal->fileInfoSet);
if (pRet == NULL) {
fileFirstVer = pWal->vers.lastVer + 1;
} else {
fileFirstVer = pRet->firstVer;
}
Bugs often turn up in error handlers, and null pointer dereferences are no exception. It's hardly surprising since almost no one ever tests these parts of code. Neither does anyone write unit tests for them—it's just too tedious.
int32_t ctgGetFetchName(SArray* pNames, SCtgFetch* pFetch, SName** ppName) {
STablesReq* pReq = (STablesReq*)taosArrayGet(pNames, pFetch->dbIdx);
if (NULL == pReq) {
qError("fail to get the %dth tb in pTables, tbNum:%d",
pFetch->tbIdx, (int32_t)taosArrayGetSize(pReq->pTables));
return TSDB_CODE_CTG_INTERNAL_ERROR;
}
....
}
The PVS-Studio warning: V522 Dereferencing of the null pointer 'pReq' might take place. ctgUtil.c 1769
If the pReq
pointer is null, it's dereferenced to print the number of elements in the table:
STablesReq* pReq = ;
if (NULL == pReq) {
....(pReq->pTables));
A dubious idea :)
On the one hand, the error doesn't seem crucial: it's unlikely to come up, otherwise it would have been noticed and fixed.
On the other hand, it's critical:
Here are similar defects in error handlers:
template <typename It>
static void
linkResultDirectedEdges(It first, It last)
// throw(TopologyException);
{
for(; first != last; ++first) {
Node* node = *first;
assert(node);
EdgeEndStar* ees = node->getEdges();
assert(ees);
DirectedEdgeStar* des = dynamic_cast<DirectedEdgeStar*>(ees);
assert(des);
// this might throw an exception
des->linkResultDirectedEdges();
}
}
The PVS-Studio warning: V522 There might be dereferencing of a potential null pointer 'des'. PlanarGraph.h 98
The dynamic_cast
operator may return a null pointer, and therefore must be checked. However, using assert
is clearly incorrect in this case. Such a check is of little use. If the pointer is null when running the debug version, the error will be noticed even without the assert
operator. In the case of a release build, the assert
macro will turn into nothing, and the use of a null pointer, leading to undefined behavior in the future.
Assertions (assert
) are intended to ensure that the data is within the expected ranges while testing the application. But in this case, by using dynamic_cast
, the programmer implies that the object casting may fail and the pointer will be null. In other words, a null pointer is an expected option. If a developer wants type casting to always perform successfully, they should have used static_cast
. Don't hesitate to read the article on a related topic: "Why it is bad idea to check result of malloc call with assert".
It's better to replace assert
with the if
operator and write the code that handles the case when the pointer is null.
These are other similar errors:
There is no check at all after dynamic_cast
is executed. A null pointer can be dereferenced while evaluating a condition when nextedge->getEdgeDirection()
is called.
LineMergeDirectedEdge*
LineMergeDirectedEdge::getNext(bool checkDirection)
{
....
if(getToNode()->getOutEdges()->getEdges()[0] == getSym()) {
auto nextedge = dynamic_cast<LineMergeDirectedEdge*>(
getToNode()->getOutEdges()->getEdges()[1]);
return (!checkDirection || nextedge->getEdgeDirection()) ?
nextedge : nullptr;
}
....
}
The PVS-Studio warning: V522 There might be dereferencing of a potential null pointer 'nextedge'. LineMergeDirectedEdge.cpp 57
The result of dynamic_cast
must be checked. If the type casting is expected to succeed, the faster static_cast
operator should be used instead. This will add clarity to those who maintain the code.
In this case, it seems logical to me to refine the condition:
auto nextedge = dynamic_cast<LineMergeDirectedEdge*>(
getToNode()->getOutEdges()->getEdges()[1]);
return (!checkDirection ||
(nextedge && nextedge->getEdgeDirection())) ?
nextedge : nullptr;
However, this code looks complicated, so let's make it more readable:
auto nextedge = dynamic_cast<LineMergeDirectedEdge*>(
getToNode()->getOutEdges()->getEdges()[1]);
if (!checkDirection ||
(nextedge && nextedge->getEdgeDirection()))
{
return nextedge;
}
return nullptr;
Other warnings:
Do you remember the article about a stack-consuming macro? It said that macros can harbor unpleasant surprises that are hard to notice on a code review. Here comes another example of a "macro mess".
int32_t qWorkerInit(....) {
....
if (NULL == mgmt->schHash) {
taosMemoryFreeClear(mgmt);
qError("init %d scheduler hash failed", mgmt->cfg.maxSchedulerNum);
QW_ERR_JRET(terrno);
}
....
}
Did you spot the error? I guess not. Reviewing this code, one may find it hard to see what's wrong here.
The issue is that taosMemoryFreeClear
is not a function call to free memory, but a macro that also nullifies the pointer.
#define taosMemoryFreeClear(ptr) \
do { \
if (ptr) { \
taosMemoryFree((void *)ptr); \
(ptr) = NULL; \
} \
} while (0)
So, an attempt to print a message will result in dereferencing a null pointer.
The PVS-Studio warning: V522 Dereferencing of the null pointer 'mgmt' might take place. qworker.c 1442
To fix the code, the author should place the qError
function call before the macro.
if (NULL == mgmt->schHash) {
qError("init %d scheduler hash failed", mgmt->cfg.maxSchedulerNum);
taosMemoryFreeClear(mgmt);
QW_ERR_JRET(terrno);
}
Oh, those macros...
The TDengine project devs should keep an eye on memory allocation checks after malloc functions (or similar ones). Sometimes the checks are there:
void* buf = taosMemoryMalloc(tlen);
if (NULL == buf) {
taosArrayDestroy(reqNew.pArray);
tDeleteSVCreateTbBatchReq(&req);
goto end;
}
But often, there are none. This is quite sad and bad given it's a library for IoT devices:
Learn more details in the article: "Four reasons to check what the malloc function returned". If someone in your team doesn't check pointers after malloc
, I suggest you gently make them read this article—repeatedly—until they are fully aware and enlightened.
What does the lack of checks look like in TDengine? Diverse, yet boring to delve into. I'm going to give you a couple examples.
taos_linked_list_t *taos_linked_list_new(void) {
taos_linked_list_t *self =
(taos_linked_list_t *)taos_malloc(sizeof(taos_linked_list_t));
self->head = NULL;
self->tail = NULL;
self->free_fn = NULL;
self->compare_fn = NULL;
self->size = 0;
return self;
}
The PVS-Studio warning: V522 There might be dereferencing of a potential null pointer 'self'. Check lines: 28, 27. taos_linked_list.c 28
unsigned char*
SZ_skip_compress_double(double* data, size_t dataLength, size_t* outSize)
{
*outSize = dataLength*sizeof(double);
unsigned char* out = (unsigned char*)malloc(dataLength*sizeof(double));
memcpy(out, data, dataLength*sizeof(double));
return out;
}
The PVS-Studio warning: V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 28, 27. sz_double.c 28
Note. Some may point out that some errors stem from libraries rather than TDengine itself. But as I've already mentioned, when it comes to secure development, it doesn't matter where exactly the bug is found. Let me repeat once again: there is no difference for a user whether the application crashes because of an error in TDengine or in another library that was needed to build TDengine. Developers are responsible not only for the quality of their own code, but also for the quality of the third-party code they use.
It is a common error when a pointer is used before it's checked. The simplest error of this type is as follows:
bool
RectangleIntersection::clip_linestring_parts(
const geom::LineString* gi, ....)
{
auto n = gi->getNumPoints();
if(gi == nullptr || n < 1) {
return false;
}
....
}
The PVS-Studio warning: V595 The 'gi' pointer was utilized before it was verified against nullptr. Check lines: 137, 139. RectangleIntersection.cpp 137
The gi
pointer check should have been written earlier. I think there is no point in elaborating on this case.
Here is a similar case:
int32_t tsortOpen(SSortHandle* pHandle) {
int32_t code = 0;
if (pHandle->opened) {
return code;
}
if (pHandle == NULL || pHandle->fetchfp == NULL ||
pHandle->comparFn == NULL) {
return TSDB_CODE_INVALID_PARA;
}
....
}
V595 The 'pHandle' pointer was utilized before it was verified against nullptr. Check lines: 2883, 2887. tsort.c 2883
It's the same here. There are some variations, but hopefully the point is clear. If the logic behind the V595 diagnostic rule doesn't seem quite obvious to you, you may learn more about it from this post: "Explanation on Diagnostic V595". It's been 10 years since this note was written. Since then, its data flow analysis has grown more advanced, and it now evaluates possible pointer values with greater precision. However, this does not make this diagnostic rule any less useful, and it still effectively finds bugs caused by incorrect ordering of pointer dereference and null checks.
Manual memory management is fraught with errors. The TDengine project is mostly written in C, so it's not surprising that we may encounter errors of this type.
taos_map_t *taos_map_new() {
int r = 0;
taos_map_t *self = (taos_map_t *)taos_malloc(sizeof(taos_map_t));
self->size = 0;
self->max_size = TAOS_MAP_INITIAL_SIZE;
self->keys = taos_linked_list_new();
if (self->keys == NULL) return NULL;
....
}
The PVS-Studio warning: V773 The function was exited without releasing the 'self' pointer. A memory leak is possible. taos_map.c 82
If a new key can't be created, the taos_map_new
function terminates prematurely. This does not free the memory buffer pointed to by self
.
INLINE void addDIA_Data(DynamicIntArray *dia, int value)
{
if(dia->size==dia->capacity)
{
dia->capacity = dia->capacity << 1;
dia->array = (unsigned char *)
realloc(dia->array, dia->capacity*sizeof(unsigned char));
}
dia->array[dia->size] = (unsigned char)value;
dia->size ++;
}
The PVS-Studio warning: V701 realloc() possible leak: when realloc() fails in allocating memory, original pointer 'dia->array' is lost. Consider assigning realloc() to a temporary pointer. DynamicIntArray.c 54
If the realloc
function fails to allocate a new buffer, it will return NULL
. The old value of the dia->array
pointer will be lost and it will be impossible to free the buffer whose address was previously stored in it.
Given the lack of a pointer check further, the memory leak is minor. However, I'd like to highlight this improper handling of the realloc
function.
Note. In this fragment, the memory allocation error will lead to interesting consequences. Data will be written...
dia->array[dia->size] = (unsigned char)value;
...not to a null pointer, but to some remote memory area, the address of which depends on the previous array size. Consequences are unpredictable. Access violation is possible. Perhaps, some data in memory will be corrupted but the program will continue to work, at least for a while. As they say, happy debugging :)
Other similar errors:
Status SstFileWriter::Open(const std::string& file_path) {
....
for (size_t i = 0; i < user_collector_factories.size(); i++) {
int_tbl_prop_collector_factories.emplace_back(
new UserKeyTablePropertiesCollectorFactory(
user_collector_factories[i]));
}
....
}
The PVS-Studio warning: V1023 A pointer without owner is added to the 'int_tbl_prop_collector_factories' container by the 'emplace_back' method. A memory leak will occur in case of an exception. sst_file_writer.cc 298
If the container is full, memory is reallocated. This operation may fail, resulting in a std::bad_alloc
exception. In this case, the pointer will be lost and the created object will never be deleted.
A secure construction that protects against potential memory leaks:
int_tbl_prop_collector_factories.emplace_back(
std::make_unique<UserKeyTablePropertiesCollectorFactory>(
user_collector_factories[i]));
The high performance of the C and C++ languages comes at the cost of many automatic checks being absent. There are also no built-in checks for allocated buffer overflows. Compilers partially compensate for the lack of checks by performing static analysis and issuing warnings for obvious cases. However, it is also useful to use specialized tools such as PVS-Studio that can detect even more errors.
const char* rocksdb_iter_value(const rocksdb_iterator_t* iter, size_t* vlen) {
Slice s = iter->rep->value();
*vlen = s.size();
return s.data();
}
int32_t streamDefaultIterGet_rocksdb(....) {
....
while (rocksdb_iter_valid(pIter)) {
const char* key = rocksdb_iter_key(pIter, &klen);
int32_t vlen = 0;
const char* vval = rocksdb_iter_value(pIter, (size_t*)&vlen);
....
}
The PVS-Studio warning: V512 A call of the 'rocksdb_iter_value' function will lead to overflow of the buffer '& vlen'. streamBackendRocksdb.c 4390
This code will work on the 32-bit app version but fail on a 64-bit build.
The address of the vlen
32-bit variable is interpreted as a pointer to the type size_t
:
int32_t vlen = 0;
const char* vval = rocksdb_iter_value(pIter, (size_t*)&vlen);
In the rocksdb_iter_value
function, a value of type size_t
will be written at this address.
In a 32-bit program—if we do not consider exotic architectures—the size of the size_t
variable is 4 bytes and coincides with the size of the int32_t
type. The code will work correctly.
In a 64-bit program, the size of size_t
type equals 8 bytes. So, writing a 64-bit value by the address of the vlen
variable causes some data to be written outside this variable. As a result, 4 bytes will be written to the stack after this variable, resulting in undefined program behavior.
First, let's keep in mind that MAX_QUERY_VALUE_LEN
is 1024:
#define MAX_QUERY_VALUE_LEN 1024
Next, note that the third dimension of the array char data[100][100][100][1024]
is also 1024:
typedef struct _script_t {
....
char cols[12];
char data[100][100][1024];
char system_exit_code[12];
....
} SScript;
Finally, here is the code containing an error:
bool simExecuteNativeSqlCommand(SScript *script, char *rest, bool isSlow) {
....
char *value = NULL;
if (i < MAX_QUERY_COL_NUM) {
value = script->data[numOfRows][i];
}
if (value == NULL) {
continue;
}
....
int32_t *length = taos_fetch_lengths(pSql);
....
if (length[i] < 0 || length[i] > 1 << 20) {
fprintf(stderr, "Invalid length(%d) of BINARY or NCHAR\n", length[i]);
exit(-1);
}
memset(value, 0, MAX_QUERY_VALUE_LEN);
memcpy(value, row[i], length[i]);
value[length[i]] = 0;
....
}
The PVS-Studio warning: V512 A call of the 'memcpy' function will lead to overflow of the buffer 'value'. simExec.c 786
There is some value in an element of the length[i]
array. It's pre-checked:
if (length[i] < 0 || length[i] > 1 << 20) {
fprintf(stderr, "Invalid length(%d) of BINARY or NCHAR\n", length[i]);
exit(-1);
}
This value is then used as the size of the data to be copied:
memcpy(value, row[i], length[i]);
The problem is that 1 << 20
is not 1024, but 1048576. So, the check doesn't save from a potential buffer overflow. I think the correct thing to do is to use a named constant MAX_QUERY_VALUE_LEN
in the condition, rather than a magic number. This constant should also be used when declaring the array.
char data[100][100][MAX_QUERY_VALUE_LEN];
....
if (length[i] < 0 || length[i] > MAX_QUERY_VALUE_LEN) {
fprintf(stderr, "Invalid length(%d) of BINARY or NCHAR\n", length[i]);
exit(-1);
}
memset(value, 0, MAX_QUERY_VALUE_LEN);
memcpy(value, row[i], length[i]);
Oops, I just realized there's another error in the code:
value[length[i]] = 0;
This line is unnecessary and even harmful. First, the array is already pre-filled with zeros after the memset
function call and the additional null terminator is not needed. Second, if length[i] == MAX_QUERY_VALUE_LEN
, the array overflows. Basically, the author should delete this line.
Let's dwell on this error a bit more. Since we assume the null terminator, we cannot copy MAX_QUERY_VALUE_LEN
bytes. Then there will be no room for the null terminator. Hence, we need to modify the check once again and use the >=
operator instead of >
.
if (length[i] < 0 || length[i] >= MAX_QUERY_VALUE_LEN)
Let's take a moment to recap. Can you imagine we've already gotten to 160 bugs?!
And do you realize that we've already found 160 errors in the database? That's so unfortunate.
I prescribe the immediate introduction of static code analyzers into the TDengine development process :)
dictionary * iniparser_load(const char * ininame)
{
....
char line [ASCIILINESZ+1] ;
....
memset(line, 0, ASCIILINESZ);
....
last=0 ;
while (fgets(line+last, ASCIILINESZ-last, in)!=NULL) {
lineno++ ;
len = (int)strlen(line)-1;
if (len==0)
continue;
/* Safety check against buffer overflows */
if (line[len]!='\n') {
fprintf(stderr,
"iniparser: input line too long in %s (%d)\n",
ininame,
lineno);
dictionary_del(dict);
fclose(in);
return NULL ;
}
....
}
The PVS-Studio warning: V557 Array underrun is possible. The value of 'len' index could reach -1. iniparser.c 695
If the input for fgets
is an empty string, the len
variable will be -1, which leads to an array overflow. We've gone through it in the article: "Shoot yourself in the foot when handling input data". This note considers identical reproduced errors.
Given the above, this comment in the code looks even more ironic:
/* Safety check against buffer overflows */
Various programming languages have different protection measures against null pointers/references, array overflows, and division by zero. Some, like C and C++, rely entirely on a developer. Others throw exceptions. However, no language is immune to typos. It's hard to determine what a typo is, so there are no standard, language-level methods for dealing with them.
But that doesn't mean that nothing can be done. PVS-Studio analyzer detects a large number of common types of typos. While it doesn't implement a universal approach, it handles a wide range of typical cases—and that makes it highly effective in practice.
static int32_t getRowsBlockWithinMergeLimit(....) {
....
if (keepRows == 0) {
*pSkipBlock = true;
*pRes = pOrigBlk;
}
*pSkipBlock = false;
....
}
The PVS-Studio warning: V519 The '* pSkipBlock' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 2198, 2202. tsort.c 2202
The line *pSkipBlock = true
makes no sense, as the value will change to false
anyway. Most likely, the author should have added else
:
if (keepRows == 0) {
*pSkipBlock = true;
*pRes = pOrigBlk;
}
else {
*pSkipBlock = false;
}
static void processSimpleMeta(SMqMetaRsp* pMetaRsp, cJSON** meta) {
...
} else if (pMetaRsp->resMsgType == TDMT_VND_ALTER_TABLE) {
processAlterTable(pMetaRsp, meta);
} else if (pMetaRsp->resMsgType == TDMT_VND_DROP_TABLE) {
processDropTable(pMetaRsp, meta);
} else if (pMetaRsp->resMsgType == TDMT_VND_DROP_TABLE) {
processDropTable(pMetaRsp, meta);
} else if (pMetaRsp->resMsgType == TDMT_VND_DELETE) {
....
}
The PVS-Studio warning: V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 2316, 2318. clientRawBlockWrite.c 2316
The above blocks of the same-type test might have been written using the copy-paste method. At some point, the developer probably got distracted, and repeated this snippet:
} else if (pMetaRsp->resMsgType == TDMT_VND_DROP_TABLE) {
processDropTable(pMetaRsp, meta);
This is a fairly common type of error. We have plenty of them in our collection. If one of the blocks is redundant, the author can safely remove it. In this case, the typo doesn't affect the program operation. In the worst case, the developer implied another check or another action here.
OffsetSegmentGenerator::OffsetSegmentGenerator(....) : ....
{
....
// compute intersections in full precision, to provide accuracy
// the points are rounded as they are inserted into the curve line
filletAngleQuantum = MATH_PI / 2.0 / bufParams.getQuadrantSegments();
int quadSegs = bufParams.getQuadrantSegments();
if (quadSegs < 1) quadSegs = 1;
filletAngleQuantum = MATH_PI / 2.0 / quadSegs;
....
}
The PVS-Studio warning: V519 The 'filletAngleQuantum' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 82, 86. OffsetSegmentGenerator.cpp 86
The author wrote the following code:
filletAngleQuantum = MATH_PI / 2.0 / bufParams.getQuadrantSegments();
The getQuadrantSegments
function can return 0, so the code was rewritten to protect against division by zero:
int quadSegs = bufParams.getQuadrantSegments();
if (quadSegs < 1) quadSegs = 1;
filletAngleQuantum = MATH_PI / 2.0 / quadSegs;
Except, they forgot to delete the previous line. As a result, we can end up with the division by zero and, as a result, undefined behavior.
typedef struct {
int64_t firstVer;
int64_t lastVer;
int64_t createTs;
int64_t closeTs;
int64_t fileSize;
int64_t syncedOffset;
} SWalFileInfo;
static inline int64_t walGetCurFileFirstVer(SWal* pWal) {
if (pWal->writeCur == -1) return -1;
SWalFileInfo* pInfo =
(SWalFileInfo*)taosArrayGet(pWal->fileInfoSet, pWal->writeCur);
return pInfo->firstVer;
}
static inline int64_t walGetCurFileLastVer(SWal* pWal) {
if (pWal->writeCur == -1) return -1;
SWalFileInfo* pInfo =
(SWalFileInfo*)taosArrayGet(pWal->fileInfoSet, pWal->writeCur);
return pInfo->firstVer;
}
The PVS-Studio warning: V524 It is odd that the body of 'walGetCurFileLastVer' function is fully equivalent to the body of 'walGetCurFileFirstVer' function. walInt.h 97
There are two data members in the SWalFileInfo
structure:
There are two functions to obtain values from these data members:
But the function bodies are identical and both return the value of the firstVer
data member.
int32_t dmInit() {
dInfo("start to init dnode env");
int32_t code = 0;
....
if ((code = dmCheckDiskSpace()) != 0) return code;
if ((code = dmCheckRepeatInit(dmInstance())) != 0) return code;
if ((code = dmInitSystem()) != 0) return code;
if ((code = dmInitMonitor()) != 0) return code;
if ((code = dmInitAudit()) != 0) return code;
if ((code = dmInitDnode(dmInstance())) != 0) return code;
if ((code = InitRegexCache() != 0)) return code;
....
}
The PVS-Studio warning: V593 Consider reviewing the expression of the 'A = B != C' kind. The expression is calculated as following: 'A = (B != C)'. dmEnv.c 182
Who spotted the error? ;)
A nice typo, I think. The parenthesis is misplaced in the last check. The result of the InitRegexCache
function call is compared to 0, and only after 0 or 1, it is written to the code
variable.
Here are a few other similar typos:
void
CoordinateSequence::add(const CoordinateSequence& cs,
std::size_t from, std::size_t to)
{
if (cs.stride() == stride() && cs.hasM() == cs.hasM()) {
m_vect.insert(m_vect.end(),
std::next(cs.m_vect.cbegin(),
static_cast<std::ptrdiff_t>(from * stride())),
std::next(cs.m_vect.cbegin(),
static_cast<std::ptrdiff_t>((to + 1u)*stride())));
} else {
....
}
The PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: cs.hasM() == cs.hasM() CoordinateSequence.cpp 154
The developer's hand trembled and accidentally added an extra cs.
. As a result, some part of the condition will always be true:
cs.hasM() == cs.hasM()
bool startsWith(const std::string & s, char prefix) {
if (s.empty() == 0) {
return false;
}
return s[0] == prefix;
}
The PVS-Studio warning: V557 Array overrun is possible. The '0' index is pointing beyond array bound. string.cpp 53
The analyzer's warning isn't quite correct here. In fact, there is no array overflow. Even an empty string (the null-terminated string) contains at least one character. However, the analyzer points out that if the string is empty, there is no point to access a null element. It's weird and it's probably some kind of a typo.
Note N1. Perhaps, we should refine this diagnostic rule to show another message for such a case. I'll open a task for my colleagues.
Note N2. Before C++11, such code was considered erroneous as it causes undefined behavior.
The error is that the result of the empty
function call is compared to 0. The comparison is obviously unnecessary and the correct code should look like this:
bool startsWith(const std::string & s, char prefix) {
if (s.empty()) {
return false;
}
return s[0] == prefix;
}
int32_t streamTaskUpdateTaskCheckpointInfo(....) {
....
bool valid = (pInfo->checkpointId <= pReq->checkpointId &&
pInfo->checkpointVer <= pReq->checkpointVer &&
pInfo->processedVer <= pReq->checkpointVer);
....
}
The PVS-Studio warning: V1013 Suspicious subexpression in a sequence of similar comparisons. streamCheckpoint.c 654
The variable names look similar, so the typo is not surprising.
The pInfo->processedVer
variable should be compared to pReq->processedVer
instead of comparing with pReq->checkpointVer
.
I've already broken down two other typos in previous articles on refactoring sloppy code:
The project will benefit greatly from fixing the errors we've reviewed in this article. However, this piece isn't meant to serve as a guide to the errors. I've already mentioned that in the introduction, but let me say it again:
As a conclusion, let's look at a few more various errors not mentioned in the above sections.
uint64_t unpackUint64(uint8_t* ch, uint8_t sz) {
uint64_t n = 0;
for (uint8_t i = 0; i < sz; i++) {
n = n | (ch[i] << (8 * i));
}
return n;
}
The PVS-Studio warning: V629 Consider inspecting the 'ch[i] << (8 * i)' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. indexFstUtil.c 55
A 64-bit value must be created from a byte array. However, a failure will occur when attempting to set the upper 32-bits in a 64-bit variable. On shift, the left operand—an 8-bit unsigned character—will be implicitly converted to a 32-bit int
. However, it's not enough: if the int
value is shifted by more than 31 bits, an overflow will occur. The developer must explicitly cast the operand to a 64-bit type beforehand:
n = n | ((uint64_t)(ch[i]) << (8 * i));
Here is a similar error:
static int hashset_add(hashset_t set, void *item) {
int ret = hashset_add_member(set, item);
size_t old_capacity = set->capacity;
if (set->nitems >= (double)old_capacity * set->load_factor) {
size_t *old_items = set->items;
++set->nbits;
set->capacity = (size_t)(1 << set->nbits);
....
}
But the PVS-Studio warning is different: V1028 Possible overflow. Consider casting operands of the '1 << set->nbits' operator to the 'size_t' type, not the result. tdbPager.c 88
How about a little attention test? Try to find the bug in this function:
static int32_t getBlkFromSessionCache(struct SOperatorInfo* pOperator,
int64_t sessionId, SGcSessionCtx* pSession, SSDataBlock** ppRes)
{
int32_t code = TSDB_CODE_SUCCESS;
SGroupCacheOperatorInfo* pGCache = pOperator->info;
bool locked = false;
SGcDownstreamCtx* pCtx = &pGCache->pDownstreams[pSession->downstreamIdx];
while (true) {
bool got = false;
code = getBlkFromSessionCacheImpl(pOperator, sessionId,
pSession, ppRes, &got);
if (TSDB_CODE_SUCCESS != code || got) {
goto _return;
}
if ((atomic_load_64(&pCtx->fetchSessionId) == sessionId)
|| (-1 == atomic_val_compare_exchange_64(
&pCtx->fetchSessionId, -1, sessionId))) {
if (locked) {
(void)taosThreadMutexUnlock(&pSession->pGroupData->mutex);
locked = false;
}
code = getCacheBlkFromDownstreamOperator(pOperator, pCtx,
sessionId, pSession, ppRes);
goto _return;
} else {
// FOR NOW, SHOULD NOT REACH HERE
qError("Invalid fetchSessionId:%" PRId64 ",
currentSessionId:%" PRId64, pCtx->fetchSessionId, sessionId);
return TSDB_CODE_QRY_EXECUTOR_INTERNAL_ERROR;
}
if (locked) {
code = groupCacheSessionWait(pOperator, pCtx, sessionId,
pSession, ppRes);
locked = false;
if (TSDB_CODE_SUCCESS != code) {
goto _return;
}
break;
}
(void)taosThreadMutexLock(&pSession->pGroupData->mutex);
locked = true;
};
_return:
if (locked) {
(void)taosThreadMutexUnlock(&pSession->pGroupData->mutex);
}
return code;
}
The PVS-Studio warning: V779 Unreachable code detected. It is possible that an error is present. groupcacheoperator.c 1227
Any luck? If not, the error is hiding here:
if (....) // << (A)
{
....
goto _return; // << (B)
} else {
....
return TSDB_CODE_QRY_EXECUTOR_INTERNAL_ERROR; // << (C)
}
....
if (locked) { // << (D)
....
_return: // << (E)
Regardless of the condition (A), the code (D) will never get control. We will either move (B) to the label (E) or exit the function (C).
typedef struct SFilePage {
int32_t num;
....
} SFilePage;
typedef struct tMemBucket {
....
int32_t bytes;
....
} tMemBucket;
static int32_t loadDataFromFilePage(tMemBucket *pMemBucket, ....) {
....
SFilePage *pg = getBufPage(pMemBucket->pBuffer, *pageId);
....
(void)memcpy((*buffer)->data + offset, pg->data,
(size_t)(pg->num * pMemBucket->bytes));
....
}
The PVS-Studio warning: V1028 Possible overflow. Consider casting operands of the 'pg->num * pMemBucket->bytes' operator to the 'size_t' type, not the result. tpercentile.c 64
Explicit type casting doesn't help avoid an overflow when multiplying 32-bit variables.
(size_t)(pg->num * pMemBucket->bytes)
The author should perform type casting before multiplication, not after:
(size_t)(pg->num) * pMemBucket->bytes
Look at similar warnings:
namespace std {
inline void swap(ROCKSDB_NAMESPACE::port::WindowsThread& th1,
ROCKSDB_NAMESPACE::port::WindowsThread& th2) {
th1.swap(th2);
}
} // namespace std
The PVS-Studio warning: V1061 Extending the 'std' namespace may result in undefined behavior. win_thread.h 110
You may learn more about this defect in the documentation if you'd like to. Honestly, I'm getting tired of going over all these bugs :) I'm going to hit the 190 mark and then I'm done. Or shall I reach 200 anyway? No, down with perfectionism! I need a break.
typedef struct STreeNode {
int32_t index;
void *pData; // TODO remove it?
} STreeNode;
int32_t tMergeTreeAdjust(SMultiwayMergeTreeInfo* pTree, int32_t idx) {
....
STreeNode kLeaf = pTree->pNode[idx];
....
if (memcmp(&kLeaf, &pTree->pNode[1], sizeof(kLeaf)) != 0) {
....
}
The PVS-Studio warning: V1103 The values of padding bytes are unspecified. Comparing objects with padding using 'memcmp' may lead to unexpected result. tlosertree.c 127
In a 64-bit program, there are 4 additional bytes between the index
variable and the pointer intended to align the 64-bit pointer to the 8-byte boundary. It's a bad idea to compare such structures using the memcmp
function. The additional bytes for alignment contain random values. The memcmp
function may consider the structures as different even though the values of all fields are the same.
uint32_t taosSafeRand(void) {
....
if (!CryptGenRandom(hCryptProv, 4, &seed)) return seed;
....
}
The PVS-Studio warning: V1109 The 'CryptGenRandom' function is deprecated. Consider switching to an equivalent newer function. osRand.c 56
Other similar warnings:
Static analysis enables users to meet various technical needs listed below.
PVS-Studio can be used for all these tasks. It supports code analysis for C, C++, C#, and Java. It runs on Windows, Linux, and macOS. PVS-Studio is a SAST solution to enhance quality, reliability, and security of your projects.
0