Why QVector::size returns int?
Asked Answered
B

3

24

std::vector::size() returns a size_type which is unsigned and usually the same as size_t, e.g. it is 8 bytes on 64bit platforms.

In constrast, QVector::size() returns an int which is usually 4 bytes even on 64bit platforms, and at that it is signed, which means it can only go half way to 2^32.

Why is that? This seems quite illogical and also technically limiting, and while it is nor very likely that you may ever need more than 2^32 number of elements, the usage of signed int cuts that range in half for no apparent good reason. Perhaps to avoid compiler warnings for people too lazy to declare i as a uint rather than an int who decided that making all containers return a size type that makes no sense is a better solution? The reason could not possibly be that dumb?

Bounty answered 28/10, 2014 at 22:47 Comment(9)
It actually could possibly be that dumb. A lot of (widely used and respected) library code contains brain-dead stylistic errors like this. You'd be surprised how many otherwise talented programmers don't even know the existence of things like size_t...Hydrosol
That would make interesting working with QByteArray larger than 2GB which is actually totally achievable in my field...Bounty
@userXXX Totally agreed. Now, search for a couple of "portable" open source libraries/applications and read the source thereof. Be horrified as you perceive that they are full of utter piles of undefined behavior, non-portable constructs, easy-to-break assumptions, etc. That's because, in my experience, most programmers are pragmatists (an euphemism for "lazy") and don't program as per the language standard but as per whatever their intermediately incompetent teachers (!) and/or colleagues recommend them to do. And that's sad. Very-very sad.Hydrosol
Too bad and also ironic, I've bothered to develop the good practice of using unsigned integers for iterating over containers, now I get the compiler warnings as a punishment for doing it right.Bounty
@userXXX I know that feeling... In unrelated news: it would be lovely if C++ had sensible behavior when dealing with integers of different signedness...Hydrosol
Perhaps it's not the reason, but still lol: qt-project.org/doc/qt-5/qvector.html#fillKovar
@Kovar - it is pretty silly. You can always resize and then fill. Plus you might want to fill half of the vector without losing the other half, which you kind of cannot do with this "convenience method" LOLBounty
It might be for consistency: indexOf() returns an int, with the special value of -1 if it's not in the QVector.Padding
I think a lot of people agree that using unsigned for size was a mistake we're stuck with (including Bjarne). This doesn't explain the use of 32 bit on 64 bit platforms, but for a lot of purposes, it is enough.Tweeddale
D
19

This has been discussed several times since Qt 3 at least and the QtCore maintainer expressed that a while ago no change would happen until Qt 7 if it ever does.

When the discussion was going on back then, I thought that someone would bring it up on Stack Overflow sooner or later... and probably on several other forums and Q/A, too. Let us try to demystify the situation.

In general you need to understand that there is no better or worse here as QVector is not a replacement for std::vector. The latter does not do any Copy-On-Write (COW) and that comes with a price. It is meant for a different use case, basically. It is mostly used inside Qt applications and the framework itself, initially for QWidgets in the early times.

size_t has its own issue, too, after all that I will indicate below.

Without me interpreting the maintainer to you, I will just quote Thiago directly to carry the message of the official stance on:

For two reasons:

1) it's signed because we need negative values in several places in the API: indexOf() returns -1 to indicate a value not found; many of the "from" parameters can take negative values to indicate counting from the end. So even if we used 64-bit integers, we'd need the signed version of it. That's the POSIX ssize_t or the Qt qintptr.

This also avoids sign-change warnings when you implicitly convert unsigneds to signed:

-1 + size_t_variable        => warning
size_t_variable - 1     => no warning

2) it's simply "int" to avoid conversion warnings or ugly code related to the use of integers larger than int.

io/qfilesystemiterator_unix.cpp

size_t maxPathName = ::pathconf(nativePath.constData(), _PC_NAME_MAX);
if (maxPathName == size_t(-1))

io/qfsfileengine.cpp

if (len < 0 || len != qint64(size_t(len))) {

io/qiodevice.cpp

qint64 QIODevice::bytesToWrite() const
{
    return qint64(0);
}

return readSoFar ? readSoFar : qint64(-1);

That was one email from Thiago and then there is another where you can find some detailed answer:

Even today, software that has a core memory of more than 4 GB (or even 2 GB) is an exception, rather than the rule. Please be careful when looking at the memory sizes of some process tools, since they do not represent actual memory usage.

In any case, we're talking here about having one single container addressing more than 2 GB of memory. Because of the implicitly shared & copy-on-write nature of the Qt containers, that will probably be highly inefficient. You need to be very careful when writing such code to avoid triggering COW and thus doubling or worse your memory usage. Also, the Qt containers do not handle OOM situations, so if you're anywhere close to your memory limit, Qt containers are the wrong tool to use.

The largest process I have on my system is qtcreator and it's also the only one that crosses the 4 GB mark in VSZ (4791 MB). You could argue that it is an indication that 64-bit containers are required, but you'd be wrong:

  • Qt Creator does not have any container requiring 64-bit sizes, it simply needs 64-bit pointers

  • It is not using 4 GB of memory. That's just VSZ (mapped memory). The total RAM currently accessible to Creator is merely 348.7 MB.

  • And it is using more than 4 GB of virtual space because it is a 64-bit application. The cause-and-effect relationship is the opposite of what you'd expect. As a proof of this, I checked how much virtual space is consumed by padding: 800 MB. A 32-bit application would never do that, that's 19.5% of the addressable space on 4 GB.

(padding is virtual space allocated but not backed by anything; it's only there so that something else doesn't get mapped to those pages)

Going into this topic even further with Thiago's responses, see this:

Personally, I'm VERY happy that Qt collection sizes are signed. It seems nuts to me that an integer value potentially used in an expression using subtraction be unsigned (e.g. size_t).

An integer being unsigned doesn't guarantee that an expression involving that integer will never be negative. It only guarantees that the result will be an absolute disaster.

On the other hand, the C and C++ standards define the behaviour of unsigned overflows and underflows.

Signed integers do not overflow or underflow. I mean, they do because the types and CPU registers have a limited number of bits, but the standards say they don't. That means the compiler will always optimise assuming you don't over- or underflow them.

Example:

for (int i = 1; i >= 1; ++i)

This is optimised to an infinite loop because signed integers do not overflow. If you change it to unsigned, then the compiler knows that it might overflow and come back to zero.

Some people didn't like that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

Dubbing answered 29/10, 2014 at 1:50 Comment(1)
Also there is nice but outdated article about Qt containersLongdrawnout
A
2

unsigned numbers are values mod 2^n for some n.

Signed numbers are bounded integers.

Using unsigned values as approximations for 'positive integers' runs into the problem that common values are near the edge of the domain where unsigned values behave differently than plain integers.

The advantage is that unsigned approximation reaches higher positive integers, and under/overflow are well defined (if random when looked at as a model of Z).

But really, ptrdiff_t would be better than int.

Adherence answered 28/10, 2014 at 22:57 Comment(3)
Another reason I thought of: If adding an unsigned with a signed type, the resulting type is unsigned. This can result in unexpected and hard-to-trace behavior. What if you're adding size() to some other value, and the result is negative? Unless it's cast back to a signed integer somehow, you could run into trouble.Outgoing
@TaylorBrandstetter Yes, that's possible; generally, it's more convenient to work with signed values (flippantly...). But as very well pointed out in this answer, there would still be ptrdiff_t...Hydrosol
@Taylor you get a large positive number. Which isn't a good approximation of the behavior of "real integers", yet is a common case. The most common source of signed integer overflow in my experience is actually unsigned to signed conversion.Adherence
W
0

Unsigned integers may cause difficult to find bugs if they overflow into negative side (that is much more likely to happen than having a multi-gigabyte vector that cannot be indexed by the 4 byte integer):

  for (auto x = 0; x < myvector.size(); x++) {
    if (x - 1 < 1000) {
      // This code runs for x == 0 !
    }
  }

This generates no warning but does not work as expected. Subtracting 1 from 0 of the unsigned type (as inferred by auto) results 2^32 - 1 (4294967295) even on 32 bit system, way more than 1000. On 64 bit system it is likely 2^64 - 1.

It may be possible to argue that using signer integers here is likely more safer. Converting the returned value from unsigned into signed requires quite awkward casting or ignoring the compiler warning.

Wild answered 13/12, 2023 at 17:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.