Might look similar to: ARM and NEON can work in parallel?, but its not, I have some other issue ( may be problem with my understanding):
In the protocol stack, while we compute checksum, that is done on the GPP, I’m handing over that task now to NEON as part of a function:
Here is the checksum function that I have written as a part of NEON, posted in Stack Overflow: Checksum code implementation for Neon in Intrinsics
Now, suppose from linux this function is called,
ip_csum(){
…
…
csum = do_csum(); //function call from arm
…
…
}
do_csum(){
…
…
//NEON optimised code
…
…
returns the final checksum to ip_csum/linux/ARM
}
in this case.. what happens to ARM when NEON is doing the calculations? does ARM sit idle? or it moves on with other operations?
as you can see do_csum is called and we are waiting on that result ( or that is what it looks like)..
NOTE:
- Speaking in terms of cortex-a8
- do_csum as you can see from the link is coded with intrinsics
- compilation using gnu tool-chain
- Will be good if you also take Multi-threading or any other concept involved or comes into picture when these inter operations happen.
Questions:
- Does ARM sit idle while NEON is doing its operations? ( in this particular case)
- Or does it shelve this current ip_csum related code, and take up another process/thread till NEON is done? ( I'm almost dumb as to what happens here)
- if its sitting idle, how can we make ARM work on something else till NEON is done?