I have read through the Fortran 95 book by Metcalf, Reid and Cohen, and Numerical Recipes in Fortran 90. They recommend using WHERE, FORALL and SPREAD amongst other things to avoid unnecessary serialisation of your program.
However, I stumbled upon this answer which claims that FORALL is good in theory, but pointless in practice - you might as well write loops as they parallelise just as well and you can explicitly parallelise them using OpenMP (or automatic features of some compilers such as Intel).
Can anyone verify from experience whether they have generally found these constructs to offer any advantages over explicit loops and if statements in terms of parallel performance?
And are there any other parallel features of the language which are good in principal but not worth it in practice?
I appreciate that the answers to these questions are somewhat implementation dependant, so I'm most interested in gfortran, Intel CPUs and SMP parallelism.