Does large use of signals and slots affect application performance?
Asked Answered
N

2

37

The question is just done for educational purpose:

Does the use of 30-50 or more pairs of signals and slots between two object (for example two threads) affect the application performance, runtime or response times?

Nonanonage answered 31/5, 2012 at 17:0 Comment(0)
S
70

First of all, you should probably not put any slots in QThreads. QThreads aren't really meant to be derived from other than by reimplementing the run method and private methods (not signals!).

A QThread is conceptually a thread controller, not a thread itself. In most cases you should deal with QObjects. Start a thread, then move the object instance to that thread. That's the only way you'll get slots working correctly in the thread. Moving the thread instance (it is QObject-derived!) to the thread is a hack and bad style. Don't do that in spite of uninformed forum posts telling otherwise.

As to the rest of your question: a signal-slot call does not have to locate anything nor validate much. The "location" and "validation" is done when the connection is established. The main steps done at the time of the call are:

  1. Locking a signal-slot mutex from a pool.

  2. Iterating through the connection list.

  3. Performing the calls using either direct or queued calls.

Common Cost

Any signal-slot call always starts as a direct call in the signal's implementation generated by moc. An array of pointers-to-arguments of the signal is constructed on the stack. The arguments are not copied.

The signal then calls QMetaObject::activate, where the connection list mutex is acquired, and the list of connected slots is iterated, placing the call for each slot.

Direct Connections

Not much is done there, the slot is called by either directly calling QObject::qt_static_metacall obtained at the time the connection was established, or QObject::qt_metacall if the QMetaObject::connect was used to setup the connection. The latter allows dynamic creation of signals and slots.

Queued Connections

The arguments have to marshalled and copied, since the call has to be stored in an event queue and the signal must return. This is done by allocating an array of pointers to copies, and copy-consting each argument on the heap. The code to do that is really no-frills plain old C.

The queuing of the call is done within queued_activate. This is where the copy-construction is done.

The overhead of a queued call is always at least one heap allocation of QMetaCallEvent. If the call has any arguments, then a pointers-to-arguments array is allocated, and an extra allocation is done for each argument. For a call with n arguments, the cost given as a C expression is (n ? 2+n : 1) allocations. A return value for blocking calls is counter as an argument. Arguably, this aspect of Qt could be optimized down to one allocation for everything, but in real life it'd only matter if you're calling trivial methods.

Benchmark Results

Even a direct (non-queued) signal-slot call has a measurable overhead, but you have to choose your battles. Ease of architecting the code vs. performance. You do measure performance of your final application and identify bottlenecks, do you? If you do, you're likely to see that in real-life applications, signal-slot overheads play no role.

The only time signal-slot mechanism has significant overhead is if you're calling trivial functions. Say, if you'd call the trivial slot in the code below. It's a complete, stand-alone benchmark, so feel free to run it and see for yourself. The results on my machine were:

Warming up the caches...
trivial direct call took 3ms
nonTrivial direct call took 376ms
trivial direct signal-slot call took 158ms, 5166% longer than direct call.
nonTrivial direct signal-slot call took 548ms, 45% longer than direct call.
trivial queued signal-slot call took 2474ms, 1465% longer than direct signal-slot and 82366% longer than direct call.
nonTrivial queued signal-slot call took 2474ms, 416% longer than direct signal-slot and 653% longer than direct call.

What should be noted, perhaps, is that concatenating strings is quite fast :)

Note that I'm doing the calls via a function pointer, this is to prevent the compiler from optimizing out the direct calls to the addition function.

//main.cpp
#include <cstdio>
#include <QCoreApplication>
#include <QObject>
#include <QTimer>
#include <QElapsedTimer>
#include <QTextStream>

static const int n = 1000000;

class Test : public QObject
{
    Q_OBJECT
public slots:
    void trivial(int*, int, int);
    void nonTrivial(QString*, const QString&, const QString&);
signals:
    void trivialSignalD(int*, int, int);
    void nonTrivialSignalD(QString*, const QString&, const QString &);
    void trivialSignalQ(int*, int, int);
    void nonTrivialSignalQ(QString*, const QString&, const QString &);
private slots:
    void run();
private:
    void benchmark(bool timed);
    void testTrivial(void (Test::*)(int*,int,int));
    void testNonTrivial(void (Test::*)(QString*,const QString&, const QString&));
public:
    Test();
};

Test::Test()
{
    connect(this, SIGNAL(trivialSignalD(int*,int,int)),
            SLOT(trivial(int*,int,int)), Qt::DirectConnection);
    connect(this, SIGNAL(nonTrivialSignalD(QString*,QString,QString)),
            SLOT(nonTrivial(QString*,QString,QString)), Qt::DirectConnection);
    connect(this, SIGNAL(trivialSignalQ(int*,int,int)),
            SLOT(trivial(int*,int,int)), Qt::QueuedConnection);
    connect(this, SIGNAL(nonTrivialSignalQ(QString*,QString,QString)),
            SLOT(nonTrivial(QString*,QString,QString)), Qt::QueuedConnection);
    QTimer::singleShot(100, this, SLOT(run()));
}

void Test::run()
{
    // warm up the caches
    benchmark(false);
    // do the benchmark
    benchmark(true);
}

void Test::trivial(int * c, int a, int b)
{
    *c = a + b;
}

void Test::nonTrivial(QString * c, const QString & a, const QString & b)
{
    *c = a + b;
}

void Test::testTrivial(void (Test::* method)(int*,int,int))
{
    static int c;
    int a = 1, b = 2;
    for (int i = 0; i < n; ++i) {
        (this->*method)(&c, a, b);
    }
}

void Test::testNonTrivial(void (Test::* method)(QString*, const QString&, const QString&))
{
    static QString c;
    QString a(500, 'a');
    QString b(500, 'b');
    for (int i = 0; i < n; ++i) {
        (this->*method)(&c, a, b);
    }
}

static int pct(int a, int b)
{
    return (100.0*a/b) - 100.0;
}

void Test::benchmark(bool timed)
{
    const QEventLoop::ProcessEventsFlags evFlags =
            QEventLoop::ExcludeUserInputEvents | QEventLoop::ExcludeSocketNotifiers;
    QTextStream out(stdout);
    QElapsedTimer timer;
    quint64 t, nt, td, ntd, ts, nts;

    if (!timed) out << "Warming up the caches..." << endl;

    timer.start();
    testTrivial(&Test::trivial);
    t = timer.elapsed();
    if (timed) out << "trivial direct call took " << t << "ms" << endl;

    timer.start();
    testNonTrivial(&Test::nonTrivial);
    nt = timer.elapsed();
    if (timed) out << "nonTrivial direct call took " << nt << "ms" << endl;

    QCoreApplication::processEvents(evFlags);

    timer.start();
    testTrivial(&Test::trivialSignalD);
    QCoreApplication::processEvents(evFlags);
    td = timer.elapsed();
    if (timed) {
        out << "trivial direct signal-slot call took " << td << "ms, "
               << pct(td, t) << "% longer than direct call." << endl;
    }

    timer.start();
    testNonTrivial(&Test::nonTrivialSignalD);
    QCoreApplication::processEvents(evFlags);
    ntd = timer.elapsed();
    if (timed) {
        out << "nonTrivial direct signal-slot call took " << ntd << "ms, "
               << pct(ntd, nt) << "% longer than direct call." << endl;
    }

    timer.start();
    testTrivial(&Test::trivialSignalQ);
    QCoreApplication::processEvents(evFlags);
    ts = timer.elapsed();
    if (timed) {
        out << "trivial queued signal-slot call took " << ts << "ms, "
               << pct(ts, td) << "% longer than direct signal-slot and "
               << pct(ts, t) << "% longer than direct call." << endl;
    }

    timer.start();
    testNonTrivial(&Test::nonTrivialSignalQ);
    QCoreApplication::processEvents(evFlags);
    nts = timer.elapsed();
    if (timed) {
        out << "nonTrivial queued signal-slot call took " << nts << "ms, "
               << pct(nts, ntd) << "% longer than direct signal-slot and "
               << pct(nts, nt) << "% longer than direct call." << endl;
    }
}

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    Test t;
    return a.exec();
}

#include "main.moc"
Subordinary answered 31/5, 2012 at 17:0 Comment(2)
I would suggest to look at the very clear tutorial on how to really definitively use QThread mayaposch.wordpress.com/2011/11/01/…Vola
There is a bug in the code above, he last test prints the elapsed time (ts) of the test above that.Additament
L
4

Ofcourse they affect application performance, mainly due to the time spent over locating the connection object+ validating the slot object state n so .But the simplicity and flexibility of the signals and slots mechanism is well worth the overhead.Plus one of the major advantage of signal-slot mechanism is they are type=safe allowing communication between objects, irrespective of type of object unlike callbacks.

Compared to callbacks, signals and slots are slightly slower because of the increased flexibility they provide, although the difference for real applications is insignificant. In general, emitting a signal that is connected to some slots, is approximately ten times slower than calling the receivers directly, with non-virtual function calls. This is the overhead required to locate the connection object, to safely iterate over all connections (i.e. checking that subsequent receivers have not been destroyed during the emission), and to marshall any parameters in a generic fashion. While ten non-virtual function calls may sound like a lot, it's much less overhead than any new or delete operation, for example. As soon as you perform a string, vector or list operation that behind the scene requires new or delete, the signals and slots overhead is only responsible for a very small proportion of the complete function call costs.

Source:Signals and Slots

Lampoon answered 31/5, 2012 at 17:29 Comment(3)
Are you sure that the overhead factor is that low? Signals call QMetaObject::activate, which has about one hundred lines of code. I'd guess it's about 100 times slower than a direct non-virtual call of the slot. But I agree with you: In very most cases, this overhead is insignificant.Assign
The overhead of a new or delete operation on a modern memory allocator in a single-threaded application is small. Very small. So small in fact, that concatenating two 1000 character QStrings into a new QString takes about as much time as a direct signal-slot connection overhead!Glenglencoe
Would we say signals and slots should be frowned upon for fast realtime applications? E.g. in an embedded system, reading data from the serial port to a C++ data structure in ~ 5ms intervalsIncredulity

© 2022 - 2024 — McMap. All rights reserved.