Further reading:
I would like to present few articles of mine, that are interested in general synchronization primitives and they are digging into Monitor, C# lock statement behavior, properties, and costs depending on distinct scenarios and number of threads. It is specifically interested about CPU wastage and throughput periods to understand how much work can be pushed through in multiple scenarios:
https://www.codeproject.com/Articles/1236238/Unified-Concurrency-I-Introduction
https://www.codeproject.com/Articles/1237518/Unified-Concurrency-II-benchmarking-methodologies
https://www.codeproject.com/Articles/1242156/Unified-Concurrency-III-cross-benchmarking
Original answer:
Oh dear!
It seems that correct answer flagged here as THE ANSWER is inherently incorrect! I would like to ask the author of the answer, respectfully, to read the linked article to the end. article
The author of the article from 2003 article was measuring on Dual Core machine only and in the first measuring case, he measured locking with a single thread only and the result was about 50ns per lock access.
It says nothing about a lock in the concurrent environment.
So we have to continue reading the article and in the second half, the author was measuring locking scenario with two and three threads, which gets closer to concurrency levels of today's processors.
So the author says, that with two threads on Dual Core, the locks cost 120ns, and with 3 threads it goes to 180ns. So it seems to be clearly dependent on the number of threads accessing the lock concurrently.
So it is simple, it is not 50 ns unless it is a single thread, where the lock gets useless.
Another issue for consideration is that it is measured as average time!
If the time of iterations would be measured, there would be even times between 1ms to 20ms, simply because the majority was fast, but few threads will be waiting for processors time and incur even milliseconds long delays.
This is bad news for any kind of application which requires high throughput, low latency.
And the last issue for consideration is that there could be slower operations inside the lock and very often that is the case.
The longer the block of code is executed inside the lock, the higher the contention is and delays rise sky high.
Please consider, that over one decade has passed already from 2003, that is few generations of processors designed specifically to run fully concurrently and locking is considerably harming their performance.