Every single answer here is at least four years old and I thought I would share some perspective and experience from someone in the HFT / algorithmic trading field in 2018.
(This is not to say that any of these answers are poor as they most definitely are not however I believe it is necessary to provide insight regarding the topic that is more up to date).
To directly answer the first question: We are talking approximately 300 billionths of a second (300 nanoseconds). Recall this is latency introduced by the program itself.
There is always going to be some variance firm by firm regarding the latency of systems, however the numbers I am going to provide are the common values for internal HFT engine latency.
- On average, one third of this time (300 nanoseconds) is attributed to latency introduced by the program as you stated in your question.
- The remaining of the time is latency that exists due to co-location and other variables relating to the exchange, the matching engines, fibre optics, etc.
The question is about how fast high frequency trading systems are, and what the infrastructure looks like in terms of the hardware involved. The technology has advanced since 2014, however contrary to what a great deal of what the literature discusses in the field, FPGAs are not necessarily the go-to choice for the big players in the HFT space. Large companies such as Intel and Nvidia will cater to these firms with their specialized hardware to ensure they get everything they need from the trading system. With Intel obviously the system is going to be built more around CPUs and the kinds of computations best performed by CPUs, and with Nvidia the system will be more GPU oriented.
For systems on field programmable gate arrays (FPGAs), languages such as Verilog and VHDL are commonly used. However not everything is in assembly even for FPGA systems, most of it is highly optimized C++ with embedded inline assembly, this is where the speed often comes from. Note that this is the case for firms using all sorts of hardware (FPGAs, specialized Intel systems, etc.)
It is unfourtunate however that the top answer here states something completely false:
10 nanoseconds and 0.1 nanoseconds are exactly the same thing, because the time it takes for the order to reach the trading server is so much more than that.
This is completely false as the co-location aspect of high frequency trading has become completely standardized. Everyone is just as close to the matching engine as you are thus the internal latency of the system is of great importance.