std::sort vs intel ipp sort performance. what am I doing wrong?
Asked Answered
L

3

11

I am trying to compare the performance of std::sort (using std::vector of structs) vs intel ipp sort.

I am running this test on an Intel Xeon processor model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

I am sorting a vector of length 20000 elements and sorting 200 times. I have tried 2 diferent ipp sort routines viz. ippsSortDescend_64f_I and ippsSortRadixDescend_64f_I. In all cases, ipp sort was at least 5 to 10 times slower than std::sort. I was expecting the ipp sort maybe slower for smaller arrays but otherwise it should generally be faster than std::sort. Am I missing something here? What am I doing wrong?

std::sort is consistently faster in all my test cases.

Here is my program

#include <array>
#include <iostream>
#include <algorithm>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timeb.h>

#include <vector>
#include <chrono>

#include "ipp.h"

using namespace std;

const int SIZE = 2000000;
const int ITERS = 200;

//Chrono typedefs
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::microseconds microseconds;

//////////////////////////////////// std ///////////////////////////////////

typedef vector<double> myList;

void initialize(myList & l, Ipp64f* ptr)
{
    double randomNum;
    for (int i = 0; i < SIZE; i++)
    {
        randomNum =  1.0 * rand() / (RAND_MAX / 2) - 1;
        l.push_back(randomNum);
        ptr[i] = randomNum;
    }
}


void test_sort()
{
        array<myList, ITERS> list;
        array<Ipp64f*, ITERS> ippList;

        // allocate
        for(int i=0; i<ITERS;i++)
        {
                list[i].reserve(SIZE);
                ippList[i] = ippsMalloc_64f(SIZE);
        }

        // initialize
        for(int i=0;i<ITERS;i++)
        {
                initialize(list[i], ippList[i]);
        }

        cout << "\n\nTest Case 1: std::sort\n";
        cout << "========================\n";

        // sort vector
        Clock::time_point t0 = Clock::now();
        for(int i=0; i<ITERS;i++)
        {
            std::sort(list[i].begin(), list[i].end());
        }
        Clock::time_point t1 = Clock::now();
        microseconds ms = std::chrono::duration_cast<microseconds>(t1 - t0);
        std::cout << ms.count() << " micros" << std::endl;

        ////////////////////////////////// IPP ////////////////////////////////////////

        cout << "\n\nTest Case 2: ipp::sort\n";
        cout << "========================\n";

        // sort ipp 
        Clock::time_point t2 = Clock::now();
        for(int i=0; i<ITERS;i++)
        {
                ippsSortAscend_64f_I(ippList[i], SIZE);
        }
        Clock::time_point t3 = Clock::now();
        microseconds ms1 = std::chrono::duration_cast<microseconds>(t3 - t2);
        std::cout << ms1.count() << " micros" << std::endl;

        for(int i=0; i<ITERS;i++)
        {
          ippsFree( ippList[i] );
        }
}


///////////////////////////////////////////////////////////////////////////////////////

int main()
{
    srand (time(NULL));

    cout << "Test for sorting an array of structures.\n" << endl;
    cout << "Test case: \nSort an array of structs ("<<ITERS<<" iterations) with double of length "<<SIZE<<". \n";
        IppStatus status=ippInit();
        test_sort();
    return 0;
}

/////////////////////////////////////////////////////////////////////////////

compilation command is:

/share/intel/bin/icc -O2 -I$(IPPROOT)/include  sorting.cpp -lrt -L$(IPPROOT)/lib/intel64 -lippi -lipps -lippvm -lippcore -std=c++0x    

Program output:

Test for sorting an array of structures.

Test case:
Sort an array of structs (200 iterations) with double of length 2000000.


Test Case 1: std::sort
========================
38117024 micros


Test Case 2: ipp::sort
========================
48917686 micros
Laevogyrate answered 26/8, 2015 at 8:38 Comment(6)
Note that you don't only benchmark the sorting, but the randomizing of the vectors as well. I recommend you measure only the actual sorting, many times while adding the times together to get a total time, then show the total time as well as the average time.Indic
will try that now, but just for test i tried by removing randomize functions so only the first loop has random vector where as other iterations has sorted vectors. In this case also std::sort was much faster than ipp. anyways, checking by adding time now.Laevogyrate
@usr: The comparison is not supposed to return a negative number, or any number at all. It is supposed to return a bool. True if the first argument should precede the second, and false otherwise.Donalt
@JoachimPileborg I have made the changes in the code as per your suggestion. I dont see any improvement in the output or performance.Laevogyrate
@Laevogyrate - you are still comparing performance of the sorting algorithm performed on pre-sorted data. Is it really that useful?Henricks
@VladFeinstein I am sorting 200 vectors stored in an array and they are all distinct. Your understanding that i am sorting on pre-sorted data is not correct.Laevogyrate
P
5

I have run your code on my computer (Core i7 860).

 std::sort 32,763,268 (~33s)

 ippsSortAscend_64f_I 34,217,517 (~34s)

 ippsSortRadixAscend_64f_I 15,319,053 (~15s)

These are the expected results. std::sort is inline and highly optimized, while ippsSort_* has function call overhead and a lot of inner checks performed by all ipp functions. This should explain the little slowdown for ippsSortAscend function. Radix sort is still twice faster as expected, since it is not a comparison based sorting.

Packthread answered 14/9, 2015 at 7:29 Comment(0)
J
1

for more accurate result you need to

  • compare sorting of exactly the same distributions of random numbers;
  • remove randomize from timing;
  • use ippsSort*32f functions, to sort 'float' (not 'double') in IPP case.
Joappa answered 26/8, 2015 at 12:36 Comment(1)
I have made the changes as you suggested. i am using ippsSortDescend_64f_I and changed all floats to double. I still see that ipp sort is much slower compared to std::sort. I am starting to believe that std::sort is more efficient than ipp.Laevogyrate
S
-1

I guess you've forgotten to call ippInit() before the measuremen

Slover answered 27/8, 2015 at 13:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.