MITE (legacy pipeline) used instead of DSB (uops cache) when jump is not quite aligned on 32 bytes
Asked Answered
H

1

12

This question used to be a part of this (now updated) question, but it seems like it should be another question, since it didn't help to get an answer to the other one.


My starting point is a loop doing 3 independent additions:

for (unsigned long i = 0; i < 2000000000; i++) {
    asm volatile("" : "+r" (a), "+r" (b), "+r" (c), "+r" (d)); // prevents C compiler from optimizing out adds
    a = a + d;
    b = b + d;
    c = c + d;
}

When this loop is not unrolled, it executes in 1 cycle (which is to be expected: it contains 4 instructions: the 3 additions, and the macro-fused increment/jump; all of which can be executed in one cycle on ports 0, 1, 5 and 6). When unrolling this loop, performances are surprising, and tend to be 25% slower than the non-unrolled version, which is probably due to uops scheduling, as suggested in the comments of the previous question.

In this question, I'm not asking about the performances, but rather about why in some cases, uops come from the MITE (legacy pipeline), and in other cases, from the DSB (uop cache). (note that I'm using a Skylake with the LSD (Loop Stream Detector) disabled)

Experimentally, when the jump is not quite aligned on 32 bytes, uops are issued from the MITE rather than the DSB. ("not quite 32 bytes" really means from 2 bytes before and 3 bytes after 32 bytes. Or put another way, starting from a 32-byte aligned jump, it means adding 1 to 3 bytes of padding, or removing 1 or 2 bytes of padding)

Compiling the C code above with Clang and (manually) unrolling it one time produces the following assembly code:

    movl    $2000000000, %esi
    .p2align    4, 0x90
.LBB0_1:
    addl    %edi, %edx  # 1
    addl    %edi, %ecx
    addl    %edi, %eax
    addl    %edi, %edx  # 2
    addl    %edi, %ecx
    addl    %edi, %eax
    addq    $-2, %rsi
    jne .LBB0_1

This code executes 2 cycles/iteration, as expected, and most uops are delivered by the DSB. Adding one byte of padding before the loop causes the loop to execute in 3 cycles/iteration, and all the uops are now delivered by the MITE.

In an effort to understand what is happening, I changed the align directive to .p2align 7 (thus aligning the loop on 128 bytes), and added some padding before the loop, thus changing the loop alignment. The results are as follows (long snippet ahead; explanations below):

| Padding | Jump offset |       Cycles      |   MITE uops   |   DSB uops    |    DSB miss   | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
|       0 |          16 |   2 453 942 151   |     1 589 440 | 7 000 531 761 |        73 681 |           33 419 |
|       1 |          17 |   2 454 623 799   |     2 002 088 | 7 000 493 234 |       107 433 |           28 686 |
|       2 |          18 |   2 454 010 264   |     1 611 181 | 7 000 580 070 |        72 372 |           34 963 |
|       3 |          19 |   2 455 016 743   |     1 531 428 | 7 001 271 720 |        76 240 |           42 493 |
|       4 |          20 |   2 454 056 088   |     1 592 150 | 7 000 571 537 |        71 691 |           29 677 |
|       5 |          21 |   2 455 111 497   |     1 701 204 | 7 001 068 440 |        85 117 |           41 744 |
|       6 |          22 |   2 454 558 860   |     2 081 244 | 7 000 362 980 |       105 388 |           29 829 |
|       7 |          23 |   2 454 351 179   |     1 765 720 | 7 000 472 785 |        81 903 |           39 022 |
|       8 |          24 |   2 454 470 296   |     2 045 062 | 7 000 337 694 |       107 763 |           30 750 |
|       9 |          25 |   2 454 395 853   |     1 748 525 | 7 000 560 730 |        82 773 |           37 030 |
|      10 |          26 |   2 453 920 970   |     1 500 801 | 7 000 562 016 |        70 144 |           36 559 |
|      11 |          27 |   2 453 748 551   |     1 485 784 | 7 000 530 064 |        66 535 |           32 019 |
|      12 |          28 |   2 453 973 841   |     1 601 708 | 7 000 562 754 |        72 601 |           31 970 |
|      13 |          29 |   2 454 749 106   |     2 085 092 | 7 000 539 751 |       109 862 |           30 977 |
|      14 |          30 | **3 003 289 033** | 7 001 845 873 |       358 240 | 1 000 075 874 |           37 506 |
|      15 |          31 | **4 003 748 994** | 7 002 171 254 |       372 672 | 1 000 086 939 |           39 679 |
|      16 |          32 | **3 003 810 021** | 7 002 294 170 |       295 736 | 1 000 114 704 |           28 974 |
|      17 |          33 | **3 002 912 972** | 7 001 752 747 |       350 755 | 1 000 071 698 |           32 249 |
|      18 |          34 | **3 003 392 542** | 7 001 941 076 |       360 439 | 1 000 076 887 |           45 663 |
|      19 |          35 | **3 003 040 266** | 7 001 759 091 |       343 693 | 1 000 072 685 |           38 703 |
|      20 |          36 |   2 453 764 603   |     1 511 899 | 7 000 546 442 |        66 912 |           32 996 |
|      21 |          37 |   2 454 889 754   |     1 946 579 | 7 000 713 787 |       102 922 |           31 852 |
|      22 |          38 |   2 454 700 423   |     1 961 612 | 7 000 581 288 |       100 281 |           30 364 |
|      23 |          39 |   2 454 398 236   |     1 974 415 | 7 000 350 258 |       103 015 |           30 855 |
|      24 |          40 |   2 452 285 702   |     1 562 028 | 7 000 416 473 |        67 622 |           38 783 |
|      25 |          41 |   2 454 500 700   |     2 013 917 | 7 000 384 154 |       102 906 |           31 165 |
|      26 |          42 |   2 454 666 446   |     1 928 032 | 7 000 572 245 |        99 613 |           35 813 |
|      27 |          43 |   2 453 929 241   |     1 565 110 | 7 000 588 419 |        70 027 |           31 336 |
|      28 |          44 |   2 453 852 431   |     1 595 897 | 7 000 633 247 |        71 735 |           35 984 |
|      29 |          45 |   2 454 664 111   |     2 039 338 | 7 000 534 894 |       105 225 |           30 043 |
|      30 |          46 |   2 454 523 184   |     1 876 338 | 7 000 592 928 |        88 020 |           48 456 |
|      31 |          47 |   2 454 091 130   |     1 560 821 | 7 000 631 532 |        70 150 |           37 773 |
|      32 |          48 |   2 453 813 400   |     1 535 557 | 7 000 556 686 |        70 196 |           33 268 |
|      33 |          49 |   2 453 772 578   |     1 501 716 | 7 000 526 938 |        67 747 |           33 492 |
|      34 |          50 |   2 455 308 730   |     1 643 047 | 7 001 287 728 |        80 148 |           43 035 |
|      35 |          51 |   2 453 790 620   |     1 506 869 | 7 000 529 450 |        66 903 |           35 315 |
|      36 |          52 |   2 453 509 109   |     1 534 817 | 7 000 405 227 |        67 344 |           30 526 |
|      37 |          53 |   2 453 516 412   |     1 469 184 | 7 000 430 367 |        65 040 |           30 686 |
|      38 |          54 |   2 453 851 033   |     1 556 722 | 7 000 581 363 |        69 098 |           36 605 |
|      39 |          55 |   2 454 916 648   |     2 089 549 | 7 000 572 462 |       111 448 |           30 435 |
|      40 |          56 |   2 455 089 502   |     1 991 232 | 7 000 799 155 |       104 559 |           30 724 |
|      41 |          57 |   2 454 744 425   |     2 002 307 | 7 000 532 096 |       105 221 |           32 393 |
|      42 |          58 |   2 454 543 686   |     1 960 042 | 7 000 500 103 |       101 409 |           27 943 |
|      43 |          59 |   2 453 893 848   |     1 561 182 | 7 000 607 528 |        73 192 |           33 645 |
|      44 |          60 |   2 453 989 634   |     1 629 949 | 7 000 556 378 |        74 704 |           34 821 |
|      45 |          61 |   2 453 879 092   |     1 551 181 | 7 000 561 022 |        70 233 |           36 191 |
|      46 |          62 | **3 003 015 120** | 7 001 772 138 |       348 243 | 1 000 073 404 |           35 333 |
|      47 |          63 | **4 004 092 512** | 7 002 359 576 |       380 452 | 2 000 097 711 |           50 376 |
|      48 |          64 | **2 234 898 441** |   109 006 411 | 7 893 398 716 |       109 108 |           35 075 |
|      49 |          65 | **3 003 182 414** | 7 001 843 757 |       357 954 | 2 000 075 494 |           36 281 |
|      50 |          66 | **3 003 280 054** | 7 001 876 384 |       358 097 | 2 000 075 630 |           39 301 |
|      51 |          67 | **3 004 086 641** | 7 002 384 321 |       307 480 | 2 000 114 067 |           32 242 |
|      52 |          68 |   2 461 587 458   |    15 841 141 | 6 986 174 099 |        70 725 |           29 985 |
|      53 |          69 |   2 454 704 936   |     2 019 734 | 7 000 530 774 |       123 110 |           32 717 |
|      54 |          70 | **2 629 777 063** |   639 698 105 | 6 362 945 524 |       121 313 |           29 648 |
|      55 |          71 |   2 452 517 518   |    21 196 356 | 6 980 899 385 |     5 689 504 |           27 618 |
|      56 |          72 |   2 457 056 675   |    79 539 769 | 6 922 550 909 |    23 953 203 |           32 238 |
|      57 |          73 |   2 453 966 239   |     1 486 894 | 7 000 608 597 |        72 506 |           36 799 |
|      58 |          74 |   2 461 391 665   |    53 426 497 | 6 948 932 999 |    13 034 546 |           37 883 |
|      59 |          75 |   2 454 091 521   |     1 537 438 | 7 000 613 720 |        73 256 |           38 003 |
|      60 |          76 |   2 550 237 671   |   312 611 365 | 6 689 536 750 |    62 278 250 |           41 078 |
|      61 |          77 |   2 454 371 129   |     1 915 411 | 7 000 545 114 |       107 086 |           30 133 |
|      62 |          78 |   2 462 015 450   |    32 874 270 | 6 969 244 698 |     5 296 338 |           37 506 |
|      63 |          79 |   2 453 810 530   |     1 588 073 | 7 000 489 720 |        70 291 |           36 915 |
|      64 |          80 |   2 453 510 981   |     1 521 322 | 7 000 384 678 |        67 219 |           30 114 |
|      65 |          81 |   2 454 659 220   |     1 531 897 | 7 001 004 411 |        74 567 |           41 201 |
|      66 |          82 |   2 453 984 834   |     1 570 182 | 7 000 624 664 |        72 914 |           39 483 |
|      67 |          83 |   2 454 127 882   |     1 638 057 | 7 000 590 289 |        75 623 |           33 755 |
|      68 |          84 |   2 453 781 071   |     1 575 812 | 7 000 535 270 |        74 337 |           34 094 |
|      69 |          85 |   2 453 947 163   |     1 595 272 | 7 000 545 139 |        71 584 |           38 966 |
|      70 |          86 |   2 453 948 945   |     1 594 376 | 7 000 552 806 |        71 096 |           34 265 |
|      71 |          87 |   2 453 888 591   |     1 540 673 | 7 000 536 024 |        71 123 |           33 350 |
|      72 |          88 |   2 453 838 422   |     1 539 740 | 7 000 540 957 |        71 776 |           33 191 |
|      73 |          89 |   2 454 013 271   |     1 532 577 | 7 000 534 226 |        69 794 |           32 287 |
|      74 |          90 |   2 453 959 044   |     1 549 283 | 7 000 562 495 |        71 483 |           35 739 |
|      75 |          91 |   2 454 357 932   |     2 062 771 | 7 000 290 377 |       111 481 |           28 864 |
|      76 |          92 |   2 454 258 445   |     1 937 218 | 7 000 338 810 |       101 760 |           27 475 |
|      77 |          93 |   2 454 156 149   |     1 738 764 | 7 000 400 563 |        82 207 |           38 130 |
|      78 |          94 | **3 003 245 905** | 7 001 947 715 |       356 496 | 1 000 078 668 |           38 983 |
|      79 |          95 | **4 003 498 969** | 7 002 106 621 |       361 236 | 1 000 087 167 |           41 197 |
|      80 |          96 | **3 003 440 683** | 7 001 915 914 |       340 975 | 1 000 081 844 |           36 174 |
|      81 |          97 | **3 003 192 020** | 7 001 848 864 |       354 371 | 1 000 076 474 |           37 465 |
|      82 |          98 | **3 004 231 542** | 7 002 423 726 |       327 973 | 1 000 119 668 |           34 498 |
|      83 |          99 | **3 003 204 122** | 7 001 869 410 |       341 860 | 1 000 075 913 |           34 005 |
|      84 |         100 |   2 453 903 936   |     1 509 757 | 7 000 577 662 |        70 586 |           38 383 |
|      85 |         101 |   2 454 444 592   |     1 649 275 | 7 000 764 725 |        76 185 |           37 481 |
|      86 |         102 |   2 455 551 786   |     2 094 483 | 7 000 919 108 |       115 683 |           33 599 |
|      87 |         103 |   2 454 090 830   |     1 644 299 | 7 000 554 367 |        76 131 |           37 986 |
|      88 |         104 |   2 452 263 286   |     1 982 058 | 7 000 594 326 |       105 011 |           32 747 |
|      89 |         105 |   2 453 938 066   |     1 552 994 | 7 000 560 184 |        71 781 |           38 307 |
|      90 |         106 |   2 453 839 657   |     1 591 329 | 7 000 534 174 |        71 493 |           32 464 |
|      91 |         107 |   2 456 284 290   |     1 721 752 | 7 001 608 059 |        87 228 |           62 810 |
|      92 |         108 |   2 453 706 579   |     1 577 941 | 7 000 431 429 |        70 517 |           33 684 |
|      93 |         109 |   2 453 714 638   |     1 484 598 | 7 000 514 337 |        66 443 |           34 239 |
|      94 |         110 |   2 453 814 023   |     1 619 443 | 7 000 418 813 |        74 924 |           34 831 |
|      95 |         111 |   2 453 734 759   |     1 502 260 | 7 000 447 611 |        66 790 |           36 660 |
|      96 |         112 |   2 456 304 117   |     1 636 949 | 7 001 903 454 |        87 894 |           45 984 |
|      97 |         113 |   2 454 764 375   |     2 032 245 | 7 000 503 166 |       111 873 |           36 308 |
|      98 |         114 |   2 453 930 372   |     1 641 970 | 7 000 527 807 |        75 164 |           36 817 |
|      99 |         115 |   2 453 596 195   |     1 577 533 | 7 000 528 820 |        74 424 |           35 428 |
|     100 |         116 |   2 453 774 301   |     1 490 781 | 7 000 546 047 |        71 040 |           31 462 |
|     101 |         117 |   2 453 808 290   |     1 472 783 | 7 000 563 094 |        68 497 |           30 214 |
|     102 |         118 |   2 453 927 668   |     1 578 700 | 7 000 547 988 |        72 499 |           36 894 |
|     103 |         119 |   2 453 881 334   |     1 538 221 | 7 000 556 688 |        73 651 |           38 630 |
|     104 |         120 |   2 454 620 311   |     2 049 316 | 7 000 459 876 |       110 210 |           30 452 |
|     105 |         121 |   2 453 793 013   |     1 553 815 | 7 000 448 812 |        70 690 |           35 146 |
|     106 |         122 |   2 453 516 549   |     1 477 303 | 7 000 369 210 |        66 462 |           32 381 |
|     107 |         123 |   2 453 679 941   |     1 558 433 | 7 000 399 585 |        71 027 |           37 700 |
|     108 |         124 |   2 453 984 832   |     1 591 183 | 7 000 558 547 |        74 810 |           32 532 |
|     109 |         125 |   2 453 972 231   |     1 585 644 | 7 000 573 173 |        73 159 |           39 583 |
|     110 |         126 | **3 003 167 043** | 7 001 793 152 |       341 345 | 1 000 076 047 |           41 811 |
|     111 |         127 | **4 004 031 670** | 7 002 344 014 |       394 950 | 2 000 094 647 |           42 345 |
|     112 |         128 | **2 017 184 284** |     2 397 032 | 7 999 676 604 |        97 555 |           23 614 |
|     113 |         129 | **3 003 231 942** | 7 001 876 887 |       355 548 | 2 000 078 108 |           35 462 |
|     114 |         130 | **3 003 073 797** | 7 001 763 748 |       343 879 | 2 000 073 914 |           36 604 |
|     115 |         131 | **3 003 066 183** | 7 001 799 239 |       334 265 | 2 000 076 089 |           37 578 |
|     116 |         132 |   2 459 437 822   |    11 831 880 | 6 990 241 198 |        69 673 |           31 901 |
|     117 |         133 |   2 453 833 994   |     1 520 407 | 7 000 579 352 |        72 385 |           39 387 |
|     118 |         134 |   2 453 582 104   |     1 508 309 | 7 000 462 005 |        70 623 |           30 954 |
|     119 |         135 |   2 453 607 456   |     1 520 805 | 7 000 426 804 |        69 833 |           35 969 |
|     120 |         136 |   2 453 516 773   |   218 632 117 | 6 783 760 256 |    64 474 484 |           29 161 |
|     121 |         137 |   2 454 656 532   |     2 135 434 | 7 000 368 481 |       121 168 |           29 070 |
|     122 |         138 |   2 464 943 252   |    76 396 888 | 6 926 141 929 |    18 701 369 |           29 401 |
|     123 |         139 |   2 454 713 076   |     1 945 881 | 7 000 526 215 |       113 113 |           32 864 |
|     124 |         140 |   2 459 197 278   |    17 602 061 | 6 984 668 329 |     3 270 690 |           39 930 |
|     125 |         141 |   2 453 811 452   |     1 546 333 | 7 000 539 142 |        71 850 |           32 204 |
|     126 |         142 |   2 453 943 973   |     1 557 203 | 7 000 570 909 |        74 167 |           34 542 |
|     127 |         143 |   2 453 989 607   |     1 490 927 | 7 000 599 022 |        67 774 |           32 994 |
|     128 |         144 |   2 455 332 089   |     1 619 032 | 7 001 303 644 |        83 418 |           43 983 |

padding represents how many bytes of padding were added before the loop. Jump offset represents the alignment of the jump: it occurs 16 bytes after the start of the loop, and its value is thus always padding+16 (but it helps visualizing to have a column for it). Cycles is the number of cycles to execute the program. MITE uops is the number of uops delivered by the MITE. DSB uops is the number of uops delivered by the DSB. DSB miss is the number of DSB misses. DSB miss penalty is the number penalty cycles due to DSB-to-MITE switches. Those number were obtained using perf stat -e idq.dsb_uops,idq.mite_uops,frontend_retired.dsb_miss,dsb2mite_switches.penalty_cycles,cycles.

In the case of the loop unrolled once, performances vary quite a lot depending on whether uops are delivered by the MITE or the DSB. However, in the case of the same loop unrolled 4 times, the exact same MITE/DSB pattern can be observed, and bearly affect performances:

| Padding | Jump offset |       Cycles      |   MITE uops   |   DSB uops    |    DSB miss   | DSB miss penalty |
| ------- | ----------- | ----------------- | ------------- | ------------- | ------------- | ---------------- |
|       0 |          34 |   2 443 059 894   | 6 404 874 796 |       324 866 |       557 007 |           58 270 |
|       1 |          35 |   2 469 823 874   | 6 402 845 671 |       359 397 |       242 913 |           44 004 |
|       2 |          36 |   2 509 831 578   |     2 428 288 | 6 400 917 619 |       126 454 |           35 718 |
|       3 |          37 |   2 516 899 098   |     2 183 357 | 6 401 715 038 |       115 461 |           42 722 |
|       4 |          38 |   2 535 785 420   |     3 596 045 | 6 405 592 898 |       193 088 |          145 459 |
|       5 |          39 |   2 536 888 998   |     4 544 195 | 6 407 929 337 |       270 307 |          141 847 |
|       6 |          40 |   2 514 898 947   |     3 500 301 | 6 404 310 391 |       168 683 |          103 966 |
|       7 |          41 |   2 497 731 601   |     2 860 409 | 6 402 485 570 |       136 201 |           70 007 |
|       8 |          42 |   2 519 396 945   |     3 373 375 | 6 405 438 970 |       180 768 |           96 499 |
|       9 |          43 |   2 519 959 317   |     3 038 180 | 6 401 766 682 |       163 982 |           57 217 |
|      10 |          44 |   2 518 862 677   |     2 556 957 | 6 400 557 326 |       127 913 |           33 141 |
|      11 |          45 |   2 505 211 679   |     1 982 925 | 6 400 617 993 |        95 689 |           33 755 |
|      12 |          46 |   2 520 256 213   |     1 764 948 | 6 401 331 329 |        79 917 |           49 950 |
|      13 |          47 |   2 528 859 616   |     2 865 395 | 6 402 516 447 |       156 550 |           51 970 |
|      14 |          48 |   2 526 844 155   |     2 334 728 | 6 402 255 285 |       122 589 |           49 508 |
|      15 |          49 |   2 526 623 614   |     2 617 350 | 6 401 419 706 |       141 028 |           39 374 |
|      16 |          50 |   2 508 159 432   |     2 293 737 | 6 400 708 049 |       110 325 |           38 407 |
|      17 |          51 |   2 505 715 666   |     2 646 431 | 6 401 083 574 |       137 684 |           41 563 |
|      18 |          52 |   2 499 124 059   |     2 407 547 | 6 400 350 409 |       127 750 |           33 880 |
|      19 |          53 |   2 519 671 512   |     2 875 080 | 6 401 825 044 |       151 559 |           45 711 |
|      20 |          54 |   2 519 382 271   |     2 178 986 | 6 400 787 103 |        94 733 |           44 873 |
|      21 |          55 |   2 494 177 992   |     1 953 404 | 6 400 469 971 |        94 724 |           32 348 |
|      22 |          56 |   2 488 166 104   |     1 865 899 | 6 400 788 908 |        89 963 |           32 295 |
|      23 |          57 |   2 473 667 778   |     1 883 684 | 6 400 516 105 |        88 080 |           31 822 |
|      24 |          58 |   2 491 983 809   |     1 964 243 | 6 401 141 418 |        95 559 |           38 009 |
|      25 |          59 |   2 523 682 312   |     2 179 584 | 6 402 528 236 |       115 550 |           51 286 |
|      26 |          60 |   2 468 826 280   |     1 568 693 | 6 400 555 529 |        69 083 |           39 205 |
|      27 |          61 |   2 468 128 275   |     2 474 660 | 6 400 400 765 |       128 799 |           32 787 |
|      28 |          62 |   2 461 792 136   | 6 401 675 319 |       325 130 |        91 908 |           31 537 |
|      29 |          63 |   2 413 473 869   | 6 401 891 263 |       308 719 |   474 886 616 |           30 068 |
|      30 |          64 |   2 442 178 183   |     2 412 150 | 6 800 327 022 |       137 335 |           33 005 |
|      31 |          65 |   2 512 670 489   | 6 402 475 993 |       321 507 |    82 884 937 |           30 439 |
|      32 |          66 |   2 438 295 147   | 6 402 583 033 |       320 775 |       193 935 |           32 813 |
|      33 |          67 |   2 465 431 142   | 6 402 487 498 |       300 367 |       192 554 |           29 581 |
|      34 |          68 |   2 510 544 922   |     1 664 395 | 6 400 550 345 |        79 102 |           35 757 |
|      35 |          69 |   2 492 243 510   |     2 598 101 | 6 400 252 944 |       137 725 |           30 489 |
|      36 |          70 |   2 477 042 696   |     2 701 036 | 6 400 305 241 |       157 174 |           29 164 |
|      37 |          71 |   2 514 818 722   |     1 666 562 | 6 400 550 483 |        79 761 |           42 464 |
|      38 |          72 |   2 458 949 815   |     2 697 410 | 6 400 122 020 |       148 539 |           30 023 |
|      39 |          73 |   2 473 858 051   |     1 653 601 | 6 400 523 949 |        76 190 |           40 743 |
|      40 |          74 |   2 437 856 049   |     2 644 658 | 6 400 220 386 |       146 309 |           27 825 |
|      41 |          75 |   2 502 432 002   |     1 700 199 | 6 400 535 604 |        79 871 |           43 243 |
|      42 |          76 |   2 493 675 148   |     2 622 476 | 6 400 171 037 |       153 333 |           31 309 |
|      43 |          77 |   2 484 286 254   |     1 700 755 | 6 400 512 732 |        80 362 |           50 028 |
|      44 |          78 |   2 494 745 100   |     2 713 187 | 6 400 363 559 |       159 990 |           31 604 |
|      45 |          79 |   2 525 806 102   |     3 195 503 | 6 401 041 048 |       193 130 |           66 443 |
|      46 |          80 |   2 525 084 219   |     2 901 188 | 6 400 857 107 |       171 471 |           48 662 |
|      47 |          81 |   2 525 023 891   |     2 503 546 | 6 400 362 906 |       151 389 |           31 424 |
|      48 |          82 |   2 516 945 604   |     1 818 682 | 6 400 778 875 |        83 134 |           41 091 |
|      49 |          83 |   2 503 330 074   |     2 295 094 | 6 400 936 466 |       127 778 |           37 184 |
|      50 |          84 |   2 515 257 599   |     1 998 408 | 6 401 086 057 |       103 812 |           36 661 |
|      51 |          85 |   2 515 704 687   |     2 203 920 | 6 400 816 810 |       103 168 |           48 042 |
|      52 |          86 |   2 521 414 196   |     2 112 029 | 6 401 272 207 |       101 158 |           52 608 |
|      53 |          87 |   2 516 900 368   |     1 597 896 | 6 400 570 586 |        73 608 |           40 296 |
|      54 |          88 |   2 471 915 311   |     1 991 994 | 6 400 413 877 |        92 759 |           35 733 |
|      55 |          89 |   2 478 161 240   |     2 757 067 | 6 400 671 792 |       141 983 |           42 998 |
|      56 |          90 |   2 468 575 551   |     1 893 460 | 6 400 361 170 |        91 596 |           32 235 |
|      57 |          91 |   2 516 481 566   |     1 936 691 | 6 400 335 059 |        97 668 |           25 221 |
|      58 |          92 |   2 482 788 158   |     2 873 305 | 6 400 470 197 |       157 875 |           35 177 |
|      59 |          93 |   2 472 664 516   |     3 482 867 | 6 401 550 404 |       199 835 |           49 199 |
|      60 |          94 |   2 522 537 958   | 5 604 672 405 |   800 614 280 |    35 268 930 |       12 965 365 |
|      61 |          95 |   2 521 875 392   | 5 604 350 958 |   800 642 890 |    34 500 749 |       12 985 188 |
|      62 |          96 |   2 475 386 582   | 6 006 074 137 |   400 581 950 |    27 625 952 |        8 251 826 |
|      63 |          97 |   2 480 407 320   | 6 007 748 529 |   400 687 290 |    21 488 812 |        8 386 755 |
|      64 |          98 |   2 451 562 172   | 6 406 359 632 |       369 366 |       687 803 |           59 309 |
|      65 |          99 |   2 469 472 059   | 6 407 104 495 |       365 022 |       821 782 |           63 981 |
|      66 |         100 |   2 525 647 143   |     2 627 685 | 6 404 609 372 |       148 635 |           53 376 |
|      67 |         101 |   2 533 208 849   |     4 294 575 | 6 405 154 176 |       224 516 |          174 959 |
|      68 |         102 |   2 522 792 300   |     2 297 167 | 6 404 309 702 |       128 216 |           62 867 |
|      69 |         103 |   2 528 134 912   |     3 877 072 | 6 405 083 855 |       204 813 |          147 178 |
|      70 |         104 |   2 480 455 890   |     2 144 317 | 6 401 555 634 |       102 192 |           34 375 |
|      71 |         105 |   2 457 138 962   |     2 871 586 | 6 400 323 955 |       138 739 |           46 120 |
|      72 |         106 |   2 476 839 093   |     2 554 822 | 6 400 518 957 |       127 515 |           32 344 |
|      73 |         107 |   2 522 202 654   |     2 698 007 | 6 401 714 270 |       136 845 |           39 610 |
|      74 |         108 |   2 529 648 028   |     2 591 016 | 6 402 573 048 |       124 463 |           77 588 |
|      75 |         109 |   2 504 833 699   |     2 099 386 | 6 400 941 244 |       102 431 |           33 246 |
|      76 |         110 |   2 509 193 033   |     2 244 590 | 6 402 859 463 |       118 633 |           44 816 |
|      77 |         111 |   2 526 808 490   |     3 075 036 | 6 401 267 531 |       168 516 |           50 367 |
|      78 |         112 |   2 525 662 170   |     2 076 530 | 6 401 870 313 |       109 810 |           44 704 |
|      79 |         113 |   2 523 356 566   |     1 647 814 | 6 400 602 452 |        74 710 |           39 700 |
|      80 |         114 |   2 490 947 127   |     2 618 819 | 6 400 769 586 |       139 588 |           38 773 |
|      81 |         115 |   2 525 323 899   |     2 433 800 | 6 401 805 576 |       113 498 |           77 057 |
|      82 |         116 |   2 528 753 531   |     3 317 116 | 6 402 358 198 |       151 306 |          132 752 |
|      83 |         117 |   2 517 309 668   |     1 923 449 | 6 401 356 394 |        89 733 |           79 670 |
|      84 |         118 |   2 519 588 707   |     1 620 560 | 6 400 866 891 |        74 881 |           53 689 |
|      85 |         119 |   2 487 765 769   |     2 620 064 | 6 400 321 476 |       134 480 |           33 623 |
...

For both loops (the one unrolled once, and the one unrolled 4 times), note the exception when the jump is aligned exactly on 64 bytes: in such cases, macro-fusion does not happen (documented in Intel Optimization manual, Section 2.5.2.1 SandyBridge Legacy Pipeline), and for some reason, this causes uops to be delivered by the MITE rather than the DSB.

Question: What causes uops to be delivered by the MITE rather than the DSB when the alignment of the jump instruction is close to 32 bytes?

Housewares answered 27/1, 2020 at 17:59 Comment(1)
basically a duplicate of Code alignment dramatically affects performance . And see also Intel JCC Erratum - what is the effect of prefixes used for mitigation? , and re: the name: Intel JCC Erratum - should JCC really be treated separately? even though it affects jmp / call / ret as well as jcc.Dextrin
B
2

I think you're seeing the effects of the microcode update that fixed an erratum on Skylake and other chips:

The [microcode update] prevents jump instructions from being cached in the Decoded ICache when the jump instructions cross a 32-byte boundary or when they end on a 32-byte boundary. In this context, Jump Instructions include all jump types: conditional jump (Jcc), macro-fused op-Jcc (where op is one of cmp, test, add, sub, and, inc, or dec), direct unconditional jump, indirect jump, direct/indirect call, and return.

The instructions are encoded like this:

  48 83 c6 fe   add $0xfffffffffffffffe,%rsi
  75 ee         jne 0x10

This is why the problem starts when the jump has offset -2 to a 32B boundary: then the jump ends at the 32B boundary. Shifting further, the jump crosses over the boundary, then the fused-in add crosses over. Only when both have crossed the boundary, i.e. when the jump has offset +4, then add is also completely after the boundary, and the microcode update no longer prevent caching in the DSB.

note the exception when the jump is aligned exactly on 64 bytes: in such cases, macro-fusion does not happen

Then we probably also don't run into the mitigation, because it only affects a fused op-jcc that crosses the boundary.

Bulky answered 2/2 at 21:52 Comment(1)
Yup, looks like. How can I mitigate the impact of the Intel jcc erratum on gcc? shows how to get compilers (or assemblers like GAS) to work around it by adding padding to instructions to make them longer without NOPs, as Intel recommends.Dextrin

© 2022 - 2024 — McMap. All rights reserved.