I've been trying to solve problem 1330 from acm.timus.ru in Haskell. Basically, it boils down to this: 1) read from stdin an array A of length N (N < 10^4) and M pairs of integers (M < 10^5); 2) for each (from, to) pair, print the sum of subarray A[from..to] to stdout.
Since SO won't let me post more than 2 URLs as part of this question, I will refer to files in my Github repository below.
I came up with two solutions, which share most of the code. The first one (1330_slow.hs) uses Prelude functions (getLine/read/words) and is somewhat slow:
$ ./bench.sh slow_hs
slow_hs
Time inside the program: 2.18
MD5 (output.slow_hs.txt) = 89bcf8fd69a7fce953595d329c8f033a
The other solution (1330.hs) ditches these functions, replacing them with their Data.ByteString.Char8 equivalents (B.getLine/B.readInt/B.words), and performs decently well:
$ ./bench.sh hs
hs
Time inside the program: 0.27
MD5 (output.hs.txt) = 89bcf8fd69a7fce953595d329c8f033a
The time limit on this problem is 500 ms, so while 270 ms is fast enough (and comparable to my solutions in other languages, such as C++ and Go), 2180 ms doesn't cut it. So why is my first solution so ridiculously slow? Even by following the profiling tips from Real World Haskell I still can't make sense of this (all I could figure out was that the majority of time was spent in readIntPair function, which didn't help much).
If you want to do some testing of your own, I have a Python input generator (gen_test.py), and a pre-generated input file (input.txt) in case you don't have Python installed. And a diff (slow_fast_diff.txt) between the two solutions.