microbenchmark - 2

3

Why is iterating an std::array much faster than iterating an std::vector?

Editor's note: Followup question with optimization enabled that times only the loop: Why is iterating though `std::vector` faster than iterating though `std::array`? where we can see the effect of ...

c++linux performance microbenchmark

Beating asked 20/7, 2019 at 12:10

3

shade for parameter resource: Cannot find 'resource' in class org.apache.maven.plugins.shade.resource.ManifestResourceTransformer

I'm working on a maven project. I'm trying to integrate jmh benchmarking into my project. The pom.xml of my maven project... <parent> <groupId>platform</groupId> <artifactId...

maven benchmarking microbenchmark jmh

Veedis asked 26/4, 2017 at 6:42

3

what does STREAM memory bandwidth benchmark really measure?

I have a few questions on STREAM (http://www.cs.virginia.edu/stream/ref.html#runrules) benchmark. Below is the comment from stream.c. What is the rationale about the requirement that arrays shoul...

benchmarking cpu-architecture microbenchmark memory-bandwidth

Kropp asked 11/5, 2019 at 3:44

2

Solved

perf enable demangling of callgraph

How do I enable C++ demangling for the perf callgraph? It seems to demangle symbols when I go into annotate mode, but not in the main callgraph. Sample code (using Google Benchmark): #include &lt...

c++linux ubuntu perf microbenchmark

Maelstrom asked 10/10, 2015 at 18:44

0

How to make microbenchmark with console.time, to measure small differences in compiler optimization?

This code is an adaptation of this other one... It is an ugly code but the question is about "how to do a benchmark". The new console.time function measure the "real execution time" or is it not...

javascript microbenchmark

Rosellaroselle asked 29/3, 2019 at 21:30

2

"Escape" and "Clobber" equivalent in MSVC

In Chandler Carruth's CppCon 2015 talk he introduces two magical functions for defeating the optimizer without any extra performance penalties. For reference, here are the functions (using GNU-sty...

visual-c++benchmarking microbenchmark

Konrad asked 28/11, 2015 at 19:25

1

Solved

How to use Java 12's Microbenchmark Suite?

According to JEP 230: Microbenchmark Suite, there exists a microbenchmark suite built-in to Java 12. The JEP explains that it's basically JMH, but without needing to explicitly depend on it using M...

java benchmarking microbenchmark jmh java-12

Gagne asked 19/3, 2019 at 14:24

3

Solved

Why is String.strip() 5 times faster than String.trim() for blank string In Java 11

I've encountered an interesting scenario. For some reason strip() against blank string (contains whitespaces only) significantly faster than trim() in Java 11. Benchmark public class Test { pub...

java string performance microbenchmark java-11

Intenerate asked 5/12, 2018 at 20:25

1

Solved

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

First I have the below setup on an IvyBridge, I will insert measuring payload code in the commented location. The first 8 bytes of buf store the address of buf itself, I use this to create loop-car...

assembly x86 micro-optimization microbenchmark micro-architecture

Duple asked 8/1, 2019 at 3:53

2

Solved

Why jnz requires 2 cycles to complete in an inner loop

I'm on an IvyBridge. I found the performance behavior of jnz inconsistent in inner loop and outer loop. The following simple program has an inner loop with fixed size 16: global _start _start: m...

x86 micro-optimization microbenchmark micro-architecture

Asserted asked 12/1, 2019 at 3:17

1

Solved

Is mov r64, m64 one cycle or two cycle latency?

I'm on IvyBridge, I wrote the following simple program to measure the latency of mov: section .bss align 64 buf: resb 64 section .text global _start _start: mov rcx, 1000000000 xor rax, rax loo...

assembly x86 cpu-cache microbenchmark micro-architecture

Topflight asked 7/1, 2019 at 10:44

2

Solved

Why is CPUID + RDTSC unreliable?

I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUI...

x86 intel microbenchmark cpuid rdtsc

Neal asked 24/12, 2018 at 0:46

1

Solved

Speed difference of loop Inside vs Outside Function

Out of this SO post resulted a discussion when benchmarking the various solutions. Consider the following code # global environment is empty - new session just started # set up set.seed(20181231) ...

r microbenchmark

Holusbolus asked 30/12, 2018 at 0:42

4

Solved

Why is summing an array of value types slower then summing an array of reference types?

I'm trying to understand better how memory works in .NET, so I'm playing with BenchmarkDotNet and diagnozers. I've created a benchmark comparing class and struct performance by summing array items....

c#performance memory-management microbenchmark

Baseless asked 9/12, 2018 at 19:38

1

Solved

I don't understand the definition of DoNotOptimizeAway

I am checking on Celero git repository the meaning of DoNotOptimizeAway. But I still don't get it. Could you please help me understand it in layman's terms please. As much as you can. The celero...

c++benchmarking microbenchmark

Jolandajolanta asked 6/9, 2018 at 12:4

0

What's up with the "half fence" behavior of rdtscp?

For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it i...

performance assembly x86 microbenchmark rdtsc

Moncrief asked 4/9, 2018 at 3:53

1

Solved

Make a register depend on another one without changing its value

Consider the following x86 assembly: ; something that sets rax mov rcx, [rdi] xor rax, rcx xor rax, rcx At the end of the sequence, rax has the same value as it had on entry, but from the point ...

performance assembly x86 micro-optimization microbenchmark

Adnopoz asked 2/8, 2018 at 6:9

1

Solved

Difference between benchmark and time macro in Julia

I've recently discovered a huge difference between two macros: @benchmark and @time in terms of memory allocation information and time. For example: @benchmark quadgk(x -> x, 0., 1.) BenchmarkT...

macros julia microbenchmark

Mcleroy asked 29/6, 2018 at 12:51

5

Simple for() loop benchmark takes the same time with any loop bound

I'm willing to write a code that makes my CPU execute some operations and see how much time does he take to solve them. I wanted to make a loop going from i=0 to i<5000 and then multiplying i by...

c++performance benchmarking microbenchmark

Agog asked 19/6, 2018 at 9:27

2

Why are two separate loops faster than one?

I want to understand what kind of optimizations Java does to consecutive for loops. More precisely, I'm trying to check if loop fusion is performed. Theoretically, I was expecting that this optimiz...

java performance optimization benchmarking microbenchmark

Doronicum asked 23/2, 2018 at 16:40

4

Solved

Getting an accurate execution time in C++ (micro seconds)

I want to get an accurate execution time in micro seconds of my program implemented with C++. I have tried to get the execution time with clock_t but it's not accurate. (Note that micro-benchmarkin...

c++performance benchmarking timing microbenchmark

Voight asked 18/2, 2014 at 13:57

1

Solved

What does allocation rate means in JMH

I'm trying to measure the memory consumed when running the benchmark. I found out on the internet that I can use GC profiler to measure that. I tried but I don't understand the answer as well as se...

java garbage-collection microbenchmark jmh

Tempestuous asked 28/2, 2018 at 19:26

2

Solved

JMH - weird benchmarking results

I'm using JMH to benchmark DOM parser. I got really weird results as the first iteration actually run faster than later iterations Can anyone explain why this happens? Also, what do percentil...

java benchmarking microbenchmark jmh

Soursop asked 22/2, 2018 at 20:56

4

Solved

Fastest Linux system call

On an x86-64 Intel system that supports syscall and sysret what's the "fastest" system call from 64-bit user code on a vanilla kernel? In particular, it must be a system call that exercises the sy...

linux performance x86-64 microbenchmark

Enjoy asked 21/2, 2018 at 18:34

3

Bring code into the L1 instruction cache without executing it

Let's say I have a function that I plan to execute as part of a benchmark. I want to bring this code into the L1 instruction cache prior to executing since I don't want to measure the cost of I$ mi...

performance x86 benchmarking prefetch microbenchmark

Mize asked 1/2, 2018 at 20:32

microbenchmark Questions

Recommended topics

Hot tags