What is the meaning of the term "thread-safe"?
Asked Answered
C

16

468

Does it mean that two threads can't change the underlying data simultaneously? Or does it mean that the given code segment will run with predictable results when multiple threads are executing that code segment?

Cum answered 4/11, 2008 at 12:14 Comment(2)
Just saw an interesting discussion here about this matter: blogs.msdn.com/ericlippert/archive/2009/10/19/…Galvez
This is the new link : learn.microsoft.com/en-us/archive/blogs/ericlippert/… for the article shared by SebastianHep
U
122

Thread-safe code is code that will work even if many Threads are executing it simultaneously.

http://mindprod.com/jgloss/threadsafe.html

Ugric answered 4/11, 2008 at 12:17 Comment(11)
Indeed, in the same process :)Ugric
"To write code that will run stably for weeks takes extreme paranoia." That's a quote that I like :)Furthermore
duh! this answer just restates the question! --- And why only within the same process ??? If the code fails when multiple threads execute it from different processes, then, arguably, (the "shared memory" might be in a disk file), it is NOT thread safe !!Superordinate
Just note that here @CharlesBretana is using a more conceptual (and intuitive?) definition of 'thread' in order to cover potentially-multiprocessing scenarios where no actual threading is involved. (In Python, there are whole frameworks for doing this without threads or shared memory/disk but rather by passing pickled objects as messages.)Illegality
@Jon, .. I would posit that anytime a block of code that creates an invariant in a resource that is accessible (and modify-able) by an other thread, whether that other thread is being executed in the same process or in another process, we are "multi-threading". Unless explicitly coded not to, all modern OS's can (and do) interrupt running code to implement multi-Processing.Superordinate
@CharlesBretana If the code (segment) fails if many threads are executing it simultaneously in different processes, then it will fail if one thread per process executes it and those processes run simultaneously, so that code (segment) does not lack thread safety, but simply unsafe.Cheiro
@mg30rg, If it "fails when one thread per process executes it and those processes run simultaneously," then it is NOT "thread safe". your statement would be true if failed when one thread executes it in only one process, THEN it is simply unsafe. If it fails when multiple threads execute it concurrently, (not "simultaneously") but could not experience the same failure when executed in only one concurrent thread, then it is THREAD unsafe.Superordinate
@mg30rg. Perhaps the confusion is the result of somehow thinking that when a block of code is being executed by multiple processes, but only by one thread per process, that that, somehow is still a "Single-threaded" scenario, not a multiple-threaded scenario. This idea is not even wrong. It is just mis-definition. Clearly, multiple processes do not generally execute on the same thread in a synchronized manner, (except in rare scenarios where processes by design coordinate with one another and the OS shares threads among processes.)Superordinate
@CharlesBretana "Clearly, multiple processes do not generally execute on the same thread in a synchronized manner" I think this statement is only true in the Windows (or more generally the multithreaded) environments, but fails to be true in the Linux (or more generally in the multiprocess) environments, because the later tends to distribute batched problems between processes instead of threads. Discalimer: Both multiprocessing and multithreading are available on both mentioned OSs, only they perform better in the manner I addressed them. #noOSflameplsCheiro
@mrg30rg, If Linux tends to " ... distribute batched problems between processes instead of threads. ..." then they are on separate threads. How can two distinct processes run on only one thread ? They can't, unless they are somehow synchronized (like when running a multi-Process OS on a machine with only one CPU). But in that case, the single "thread" that everything is running on is a logical abstraction at a much higher level than the threads we are discussing.Superordinate
Single-sentence answer with a reference link.. is this a recipe for a great answer :)Ferrel
S
75

A more informative question is what makes code not thread safe- and the answer is that there are four conditions that must be true... Imagine the following code (and its machine language translation)

totalRequests = totalRequests + 1
MOV EAX, [totalRequests]   // load memory for tot Requests into register
INC EAX                    // update register
MOV [totalRequests], EAX   // store updated value back to memory
  1. The first condition is that there are memory locations that are accessible from more than one thread. Typically, these locations are global/static variables or are heap memory reachable from global/static variables. Each thread gets its own stack frame for function/method scoped local variables, so these local function/method variables, otoh, (which are on the stack) are accessible only from the one thread that owns that stack.
  2. The second condition is that there is a property (often called an invariant), which is associated with these shared memory locations, that must be true, or valid, for the program to function correctly. In the above example, the property is that “totalRequests must accurately represent the total number of times any thread has executed any part of the increment statement”. Typically, this invariant property needs to hold true (in this case, totalRequests must hold an accurate count) before an update occurs for the update to be correct.
  3. The third condition is that the invariant property does NOT hold during some part of the actual update. (It is transiently invalid or false during some portion of the processing). In this particular case, from the time totalRequests is fetched until the time the updated value is stored, totalRequests does not satisfy the invariant.
  4. The fourth and final condition that must occur for a race to happen (and for the code to therefore NOT be "thread-safe") is that another thread must be able to access the shared memory while the invariant is broken, thereby causing inconsistent or incorrect behavior.
Superordinate answered 4/11, 2008 at 16:59 Comment(6)
This covers only what is known as data races, and is of course important. Yet, there are other ways how code could not be thread safe - for example bad locking that may lead to deadlocks. Even something simple like calling System.exit() somewhere in a java thread makes that code not thread safe.Murderous
I guess to some degree this is semantics, but I would argue that bad locking code that can cause a deadlock does not make code unsafe. First, there is no need to lock the code in the first place unless a race condition, as described above, is possible. Then, if you write the locking code in such a way as to cause a deadlock, that's not thread-unsafe, it's just bad code.Superordinate
But note that the deadlock won't occur when running single-threaded, so for most of us this would surely fall under the intuitive meaning of (not) "thread-safe".Illegality
Well, deadlocks cannot occur unless you are running multi-threaded of course, But that's like saying network problems cannot happen if you are running on one machine. Other problems can happen single-threaded as well, if the programmer writes the code so that it breaks out of the critical lines of code before it completes the update, and modifies the variable in some other subroutine.Superordinate
Please use language independent lines of code called "pseudocodes" to explain the concepts as there is no mention of assembly language in the question.Bagwig
Comments added to the right of assembly do provide that content/meaning.Superordinate
G
44

An easier way to understand it, is what make code not thread-safe. There's two main issue that will make a threaded application to have unwanted behavior.

  • Accessing shared variable without locking
    This variable could be modified by another thread while executing the function. You want to prevent it with a locking mechanism to be sure of the behavior of your function. General rule of thumb is to keep the lock for the shortest time possible.

  • Deadlock caused by mutual dependency on shared variable
    If you have two shared variable A and B. In one function, you lock A first then later you lock B. In another function, you start locking B and after a while, you lock A. This is a potential deadlock where first function will wait for B to be unlocked when second function will wait for A to be unlocked. This issue will probably not occur in your development environment and only from time to time. To avoid it, all locks must always be in the same order.

Gabble answered 4/11, 2008 at 13:46 Comment(1)
Good one, explaining the problem should always be explained prior to explaining the solution.Perfidy
L
44

I like the definition from Brian Goetz's Java Concurrency in Practice for its comprehensiveness

"A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code."

Lipstick answered 4/11, 2008 at 14:32 Comment(4)
This definition is incomplete and not specific, and definitely not comprehensive. How many times must it run safely, Just once? ten times? every time? 80% of the time? and it does not specify what makes it "Unsafe". If it fails to run safely, but failure was because there's a divide by zero error, does that make it thread-"Unsafe"?Superordinate
Be more civil next time and maybe we can discuss. This isn't Reddit and I'm not in the mood to talk to rude people.Lipstick
Your interpreting comments about someone else's definition as insults to yourself is telling. You need to read and understand substance before reacting emotionally. Nothing uncivil about my comment. I was making a point about the meaning of the definition. Sorry if the examples I used to illustrate the point made you uncomfortable.Superordinate
If I may, I think this is a classic misundestanding. The commenter failed to acknowledge the robustness of this fine answer, which didn't address their own reason for clicking on this question. I'm with @Lipstick here, the commenter should show more courtesy.Plutocracy
A
34

As others have pointed out, thread safety means that a piece of code will work without errors if it's used by more than one thread at once.

It's worth being aware that this sometimes comes at a cost, of computer time and more complex coding, so it isn't always desirable. If a class can be safely used on only one thread, it may be better to do so.

For example, Java has two classes that are almost equivalent, StringBuffer and StringBuilder. The difference is that StringBuffer is thread-safe, so a single instance of a StringBuffer may be used by multiple threads at once. StringBuilder is not thread-safe, and is designed as a higher-performance replacement for those cases (the vast majority) when the String is built by only one thread.

Archiepiscopal answered 4/11, 2008 at 12:30 Comment(0)
G
26

At least in C++, I think of thread-safe as a bit of a misnomer in that it leaves a lot out of the name. To be thread-safe, code typically has to be proactive about it. It's not generally a passive quality.

For a class to be thread-safe, it has to have "extra" features that add overhead. These features are part of the implementation of the class and generally speaking, hidden from the interface. That is, different threads can access any of the class' members without ever having to worry about conflicting with a concurrent access by a different thread AND can do so in a very lazy manner, using some plain old regular human coding style, without having to do all that crazy synchronization stuff that is already rolled into the guts of the code being called.

And this is why some people prefer to use the term internally synchronized.

Terminology Sets

There are three main sets of terminology for these ideas I have encountered. The first and historically more popular (but worst) is:

  1. thread safe
  2. not thread safe

The second (and better) is:

  1. thread proof
  2. thread compatible
  3. thread hostile

A third is (even better) one is:

  1. internally synchronized
  2. externally synchronized
  3. unsynchronizable

Analogies

thread safe ~ thread proof ~ internally synchronized

An example of an internally synchronized (aka. thread-safe or thread proof) system is a restaurant where a host greets you at the door, and disallows you from queueing yourself. The host is part of the mechanism of the restaurant for dealing with multiple customers, and can use some rather tricky tricks for optimizing the seating of waiting customers, like taking the size of their party into account, or how much time they look like they have, or even taking reservations over the phone. The restaurant is internally synchronized because all of this is included "behind the scenes" when you interact with it. You, the customer, don't do any of it. The host does all of it for you.

not thread-safe (but nice) ~ thread compatible ~ externally synchronized ~ free-threaded

Suppose that you go to the bank. There is a line, i.e. contention for the bank tellers. Because you're not a savage, you recognize that the best thing to do in the midst of contention for a resource is to queue like a civilized being. No one technically makes you do this. We hope you have the necessary social programming to do it on your own. In this sense, the bank lobby is externally synchronized.

Should we say that it's thread-unsafe? that's what the implication is if you go with the thread-safe, thread-unsafe bipolar terminology set. It's not a very good set of terms. The better terminology is externally synchronized, The bank lobby is not hostile to being accessed by multiple customers, but it doesn't do the work of synchronizing them either. The customers do that themselves.

This is also called "free threaded," where "free" is as in "free from lice"--or in this case, locks. Well, more accurately, synchronization primitives. That doesn't mean the code can run on multiple threads without those primitives. It just means it doesn't come with them already installed and it's up to you, the user of the code, to install them yourself however you see fit. Installing your own synchronization primitives can be difficult and requires thinking hard about the code, but also can lead to the fastest possible program by allowing you to customize how the program executes on today's hyperthreaded CPUs.

not thread safe (and bad) ~ thread hostile ~ unsynchronizable

An example everyday analogy of a thread-hostile system is some jerk with a sports car refusing to use their blinkers and changing lanes willy-nilly. Their driving style is thread hostile or unsychronizable because you have no way to coordinate with them, and this can lead to contention for the same lane, without resolution, and thus an accident as two cars attempt to occupy the same space, without any protocol to prevent this. This pattern can also be thought of more broadly as anti-social, though that's less specific to threads and more generally applicable to many areas of programming.

Why thread safe / not thread-safe are a bad terminology set

The first and oldest terminology set fails to make the finer distinction between thread hostility and thread compatibility. Thread compatibility is more passive than so-called thread safety, but that doesn't mean the called code is unsafe for concurrent thread use. It just means it's passive about the synchronization that would allow this, putting it off to the calling code, instead of providing it as part of its internal implementation. Thread compatible is how code should probably be written by default in most cases but this is also sadly often erroneously thought of as thread unsafe, as if it's inherently anti safety, which is a major point of confusion for programmers.

NOTE: Many software manuals actually use the term "thread-safe" to refer to "thread-compatible," adding even more confusion to what was already a mess! I avoid the term "thread-safe" and "thread-unsafe" at all costs for this very reason, as some sources will call something "thread-safe" while others will call it "thread-unsafe" because they can't agree on whether you have to meet some extra standards for safety (pre-installed synchronization primitives), or just NOT be hostile to be considered "safe". So avoid those terms and use the smarter terms instead, to avoid dangerous miscommunications with other engineers.

Reminder of our goals

Essentially, our goal is to subvert chaos.

We do that by creating semi-deterministic systems we can rely on. Determinism is expensive, mostly due to the opportunity costs of losing parallelism, pipelining, and reordering. We try to minimize the amount of determinism we need to keep our costs low, while also avoiding making decisions that will further erode what little determinism we can afford. Thus, the semi- prefix. We just want certain little bits of our code's state to be deterministic, while the computational machinery underneath doesn't have to be completely so. Synchronization of threads is about increasing the order and decreasing the chaos in a multi-threaded system because having multiple threads leads to a greater amount of non-determinism naturally which must be subdued somehow.

In summary, there are three major degrees of effort some body of code can put in to 'juggle knives'--i.e. to work correctly in the context of multiple threads.

The highest degree (thread-proof, etc.) means that a system behaves in a predictable manner even if you call it from multiple threads sloppily. It does the work necessary to achieve this itself so you don't have to. It makes this nice interface for you, the programmer writing calling code, so that you can pretend to live in a world without synchronization primitives. Because it's already included them internally. It's also expensive and slow and also somewhat unpredictable when it comes to how long it takes for tasks to complete due to synchronization it's doing, which must always be greater than the amount you need for your specific program because it doesn't know what your code will do. Great for casual coders who code in various scripting languages to do science or something, but aren't themselves writing highly efficient close-to-the-metal code. They don't need to juggle knives.

The second degree (thread-compatible, etc.) means that the system behaves well enough that calling code can reliably detect unpredictability just in time to handle it correctly at runtime using its own installed synchronization primitives. D-I-Y synchronization. BYOSP = Bring Your Own Synchronization primitives. At least you know the code you're calling will play nice with them. This is for professional programmers working closer to the metal.

The third degree (thread-hostile, etc.) means that the system doesn't behave well enough to play with anyone else and can only EVER be run single-threaded without incurring chaos. This is classic early 90s and earlier code, essentially. It was programmed with a lack of awareness about how it might be called or used from multiple threads to such a high degree that even if you try to add those synchronization primitives yourself, it just won't work because it makes old fashioned assumptions that these days seem anti-social and unprofessional.

However, some code only really makes sense called single-threaded and so is still written to be called that way intentionally. This is true especially for software that already has an efficient pipeline and memory access sequence, and doesn't benefit from the main purpose of multi-threading: hiding memory access latencies. Accessing non-cache memory is ridiculously slower than most other instructions. So whenever an application is waiting for some bit of memory access, it should switch to another task thread in the meantime to keep the processor working. Of course, these days, that could mean switching to another coroutine/fiber/etc. within the same thread, when available, because these are much more efficient than a thread context switch. But once even those are exhausted for the time being, it's time to switch threads executing on our core.

But sometimes, you have all your memory accesses nicely packed and sequenced and the last thing you want is to switch to another thread because you've already pipelined your code to handle this as efficiently as possible. Then threads hurt not help. That's one example, but there others.

In general, I think it makes sense to go for thread-compatible though whenever possible while programming code meant to be called, particularly if there's no real reason not to and it just requires your awareness while coding the thing.

Garek answered 26/10, 2019 at 19:28 Comment(0)
K
23

Thread-safe-code works as specified, even when entered simultaneously by different threads. This often means, that internal data-structures or operations that should run uninterrupted are protected against different modifications at the same time.

Krisha answered 4/11, 2008 at 12:17 Comment(0)
B
11

Let's answer this by example:

class NonThreadSafe {

    private int count = 0;

    public boolean countTo10() {
        count = count + 1;
        return (count == 10);
    }

The countTo10 method adds one to the counter and then returns true if the count has reached 10. It should only return true once.

This will work as long as only one thread is running the code. If two threads run the code at the same time various problems can occur.

For example, if count starts as 9, one thread could add 1 to count (making 10) but then a second thread could enter the method and add 1 again (making 11) before the first thread has a chance to execute the comparison with 10. Then both threads do the comparison and find that count is 11 and neither returns true.

So this code is not thread safe.

In essence, all multi-threading problems are caused by some variation of this kind of problem.

The solution is to ensure that the addition and the comparison cannot be separated (for example by surrounding the two statements by some kind of synchronization code) or by devising a solution that does not require two operations. Such code would be thread-safe.

Bennett answered 10/5, 2019 at 9:7 Comment(0)
D
10

Simply - code will run fine if many threads are executing this code at the same time.

Disoperation answered 4/11, 2008 at 12:35 Comment(0)
E
9

Don't confuse thread safety with determinism. Thread-safe code can also be non-deterministic. Given the difficulty of debugging problems with threaded code, this is probably the normal case. :-)

Thread safety simply ensures that when a thread is modifying or reading shared data, no other thread can access it in a way that changes the data. If your code depends on a certain order for execution for correctness, then you need other synchronization mechanisms beyond those required for thread safety to ensure this.

Eulogy answered 4/11, 2008 at 12:31 Comment(0)
F
9

Yes and no.

Thread safety is a little bit more than just making sure your shared data is accessed by only one thread at a time. You have to ensure sequential access to shared data, while at the same time avoiding race conditions, deadlocks, livelocks, and resource starvation.

Unpredictable results when multiple threads are running is not a required condition of thread-safe code, but it is often a by-product. For example, you could have a producer-consumer scheme set up with a shared queue, one producer thread, and few consumer threads, and the data flow might be perfectly predictable. If you start to introduce more consumers you'll see more random looking results.

Frissell answered 4/11, 2008 at 12:59 Comment(0)
A
9

In essence, many things can go wrong in a multi threaded environment (instructions reordering, partially constructed objects, same variable having different values in different threads because of caching at the CPU level etc.).

I like the definition given by Java Concurrency in Practice:

A [portion of code] is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.

By correctly they mean that the program behaves in compliance with its specifications.

Contrived example

Imagine that you implement a counter. You could say that it behaves correctly if:

  • counter.next() never returns a value that has already been returned before (we assume no overflow etc. for simplicity)
  • all values from 0 to the current value have been returned at some stage (no value is skipped)

A thread safe counter would behave according to those rules regardless of how many threads access it concurrently (which would typically not be the case of a naive implementation).

Note: cross-post on Programmers

Audition answered 7/2, 2013 at 10:57 Comment(0)
H
5

To complete other answers:

Synchronization is only a worry when the code in your method does one of two things:

  1. works with some outside resource that isn't thread safe.
  2. Reads or changes a persistent object or class field

This means that variables defined WITHIN your method are always threadsafe. Every call to a method has its own version of these variables. If the method is called by another thread, or by the same thread, or even if the method calls itself (recursion), the values of these variables are not shared.

Thread scheduling is not guaranteed to be round-robin. A task may totally hog the CPU at the expense of threads of the same priority. You can use Thread.yield() to have a conscience. You can use (in java) Thread.setPriority(Thread.NORM_PRIORITY-1) to lower a thread's priority

Plus beware of:

  • the large runtime cost (already mentionned by others) on applications that iterate over these "thread-safe" structures.
  • Thread.sleep(5000) is supposed to sleep for 5 seconds. However, if somebody changes the system time, you may sleep for a very long time or no time at all. The OS records the wake up time in absolute form, not relative.
Hoey answered 4/11, 2008 at 12:41 Comment(0)
R
2

Yes and yes. It implies that data is not modified by more than one thread simultaneously. However, your program might work as expected, and appear thread-safe, even if it is fundamentally not.

Note that the unpredictablility of results is a consequence of 'race-conditions' that probably result in data being modified in an order other than the expected one.

Reopen answered 4/11, 2008 at 12:36 Comment(0)
C
1

Instead of thinking of code or classes as thread safe or not, I think it is more helpful to think of actions as being thread-safe. Two actions are thread safe if they will be behave as specified when run from arbitrary threading contexts. In many cases, classes will support some combinations of actions in thread-safe fashion and others not.

For example, many collections like array-lists and hash sets will guarantee that if they are initially accessed exclusively with one thread, and they are never modified after a reference becomes visible to any other threads, they may be read in arbitrary fashion by any combination of threads without interference.

More interestingly, some hash-set collections such as the original non-generic one in .NET, may offer a guarantee that as long as no item is ever removed, and provided that only one thread ever writes to them, any thread that tries to read the collection will behave as though accessing a collection where updates might be delayed and occur in arbitrary order, but which will otherwise behave normally. If thread #1 adds X and then Y, and thread #2 looks for and sees Y and then X, it would be possible for thread #2 to see that Y exists but X doesn't; whether or not such behavior is "thread-safe" would depend upon whether thread #2 is prepared to deal with that possibility.

As a final note, some classes--especially blocking communications libraries--may have a "close" or "Dispose" method which is thread-safe with respect to all other methods, but no other methods that are thread-safe with respect to each other. If a thread performs a blocking read request and a user of the program clicks "cancel", there would be no way for a close request to be issued by the thread that's attempting to perform the read. The close/dispose request, however, may asynchronously set a flag which will cause the read request to be canceled as soon as possible. Once close is performed on any thread, the object would become useless, and all attempts at future actions would fail immediately, but being able to asynchronously terminate any attempted I/O operations is better than require that the close request be synchronized with the read (since if the read blocks forever, the synchronization request would be likewise blocked).

Collotype answered 22/5, 2020 at 15:19 Comment(0)
M
0

In simplest words :P If it is safe to execute multiple threads on a block of code it is thread safe*

*conditions apply

Conditions are mentioned by other answeres like 1. The result should be same if you execute one thread or multiple threads over it etc.

Morice answered 15/12, 2013 at 9:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.