TL;DR: To implement asynchronous callback is basically to allow the control flow to proceed without blocking for the callback. Before the callback function is finally called, the control flow is free to execute anything that has no dependence on the callback's result, e.g., the caller can proceed as if the callback function has returned, or the caller may yield its control to other functions.
Since the question is for general implementation rather than a specific language, my answer tries to be as general as to cover the implementation commonalities.
Different languages have different implementations for asynchronous callbacks, but the principles are the same. The key is to decouple the control flow from the code executed. They correspond to the execution context (like a thread of control with a runtime stack) and the executed task. Traditionally the execution context and the executed task are usually 1:1 associated. With asynchronous callbacks, they are decoupled.
1. The principles
To decouple the control flow from the code, it is helpful to think of every asynchronous callback as a conditional task. When the code registers an asynchronous callback, it virtually installs the task's condition in the system. The callback function is then invoked when the condition is satisfied. To support this, a condition monitoring mechanism and a task scheduler are needed, so that,
The programmer does not need to track the callback's condition;
Before the condition is satisfied, the program may proceed to execute other code that does not depend on the callback's result, without blocking on the condition;
Once the condition is satisfied, the callback is guaranteed to execute. The programmer does not need to schedule its execution;
After the callback is executed, its result is accessible to the caller.
2. Implementation for Portability
For example, if your code needs to process the data from a network connection, you do not need to write the code checking the connection state. You only registers a callback that will be invoked once the data is available for processing. The dirty work of connection checking is left to the language implementation, which is known to be tricky especially when we talk about scalability and portability.
The language implementation may employ asynchronous io, nonblocking io or a thread pool or whatever techniques to check the network state for you, and once the data is ready, the callback function is then scheduled to execute. Here the control flow of your code looks like directly going from the callback registration to the callback execution, because the language hides the intermediate steps. This is the portability story.
3. Implementation for Scalability
To hide the dirty work is only part of the whole story. The other part is that, your code itself does not need to block waiting for the task condition. It does not make sense to wait for one connection's data when you have lots of network connections simultaneously and some of them may already have data ready. The control flow of your code can simply register the callback, and then moves on with other tasks (e.g., the callbacks whose conditions have been satisfied), knowing that the registered callbacks will be executed anyway when their data are available.
If to satisfy the callback's condition does not involve much of the CPU (e.g., waiting for a timer, or waiting for the data from network), and the callback function itself is light-weighted, then single CPU (or single thread) is able to process lots of callbacks concurrently, such as incoming network requests processing. Here the control flow may look like jumping from one callback to another. This is the scalability story.
4. Implementation for Parallelism
Sometimes, the callbacks are not pending for non-blocking IO condition, but for blocking operations such as page fault; or the callbacks do not rely on any condition, but are pure computation logics. In this case, asynchronous callback does not save you the CPU waiting time (because there is no idle waiting). But since asynchronous callback implies that the callback function can be executed in parallel with the caller or other callbacks (subject to certain data sharing and synchronization constraints), the language implementation can dispatch the callback tasks to different threads, achieving the benefits of parallelism, if the platform has more than one hardware thread context. It still improves scalability.
5. Implementation for Productivity
The productivity with asynchronous callback may not be very positive when the code need to deal with chained callbacks, i.e., when callbacks register other callbacks in recursive way, known as callback hell. There are ways to rescue.
The semantics of an asynchronous callback can be explored so as to substitute the hopeless nested callbacks with other language constructs. Basically there can be two different views of callbacks:
From data flow point of view: asynchronous callback = event + task.
To register a callback essentially generates an event that will emit
when the task condition is satisfied. In this view, the chained
callbacks are just events whose processing triggers other event
emission. It can be naturally implemented in event-driven
programming, where the task execution is driven by events. Promise
and Observable may also be regarded as event-driven concept. When
multiple events are ready concurrently, their associated tasks can
be executed concurrently as well.
From control flow point of view: to register a callback yields the
control to other code, and the callback execution just resumes the
control flow once its condition is satisfied. In this view, chained
asynchronous callbacks are just resumable functions. Multiple
callbacks can be written as one after another in traditional
"synchronous" way, with yield operation in between (or await). It
actually becomes coroutine.
I haven't discussed the implementation of data passing between the asynchronous callback and its caller, but that is usually not difficult if using shared memory where caller and callback can share data. Actually Golang's channel can also be considered in line of yield/await but with its focus on data passing.