How to optimise Haskell code to pass HackerRanks timed out test cases (Not for any ongoing competition, just me practicing)
Asked Answered
M

2

9

I been learning Haskell for around 4 months now and I have to say, the learning curve is definitely hard(scary also :p).

After solving about 15 easy questions, today I moved to my first medium difficulty problem on HackerRank https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem.

It was 10 test cases and I am able to pass 6 of them, but the rest fail with timeout, now the interesting part is, I can already see a few parts that have potential for performance increase, for example, I am using nub to remove duplicated from a [Int], but still I am not able to build a mental model for algorithmic performance, the main cause of that being unsure about Haskell compiler will change my code and how laziness plays a role here.

import Data.List (nub)

getInputs :: [String] -> [String]
getInputs (_:r:_:p:[]) = [r, p]

findRating :: Int -> Int -> [Int] -> Int
findRating step _ [] = step
findRating step point (x:xs) = if point >= x then step else findRating (step + 1) point xs

solution :: [[Int]] -> [Int]
solution [rankings, points] = map (\p -> findRating 1 p duplicateRemovedRatings) points
                            where duplicateRemovedRatings = nub rankings


main :: IO ()
main = interact (unlines . map show . solution . map (map read . words) . getInputs . lines)

Test Case in GHCI

:l "solution"
let i = "7\n100 100 50 40 40 20 10\n4\n5 25 50 120"
solution i // "6\n4\n2\n1\n"

Specific questions I have:

  • Will the duplicateRemovedRankings variable be calculated once, or on each iteration of the map function call.
  • Like in imperative programming languages, I can verify the above question using some sort of printing mechanism, is there some equivalent way of doing the same with Haskell.
  • According to my current understanding, the complexity of this algorithm would be, I know nub is O(n^2)
    • findRating is O(n)
    • getInputs is O(1)
    • solution is O(n^2)

How can I reason about this and build a mental model for performance.

If this violates community guidelines, please comment and I will delete this. Thank you for the help :)

Melodymeloid answered 6/10, 2021 at 12:5 Comment(4)
To answer the easy questions: once, and Debug.Trace.trace.Forworn
nub is horrible, only use it on lists that you know in advance are very small.Abfarad
Use Data.IntSet and Data.Map.Strict from the containers package (included with ghc), and your time complexity goes down to min(n * log n, m) where n is ranked length and m is player length.Tol
@Tol did you mean min(n * log n, m * log n)? we get O(n+m) with lists which is probably better. I describe it in a comment below. (unless I'm missing something)Humblebee
T
6

First, to answer your questions:

  1. Yes, duplicateRemovedRankings is computed only once. No repeated computation.
  2. To debug-trace, you can use trace and its friends (see the docs for examples and explanation). Yes, it can be used even in pure, non-IO code. But obviously, don't use it for "normal" output.
  3. Yes, your understanding of complexity is correct.

Now, how to pass HackerRank's tricky tests.

First, yes, you're right that nub is O(N^2). However, in this particular case you don't have to settle for that. You can use the fact that the rankings come pre-sorted to get a linear version of nub. All you have to do is skip elements while they're equal to the next one:

betterNub (x:y:rest) 
  | x == y = betterNub (y:rest)
  | otherwise = x : betterNub (y:rest)
betterNub xs = xs

This gives you O(N) for betterNub itself, but it's still not good enough for HackerRank, because the overall solution is still O(N*M) - for each game you are iterating over all rankings. No bueno.

But here you can get another improvement by observing that the rankings are sorted, and searching in a sorted list doesn't have to be linear. You can use a binary search instead!

To do this, you'll have to get yourself constant-time indexing, which can be achieved by using Array instead of list.

Here's my implementation (please don't judge harshly; I realize I probably got edge cases overengineered, but hey, it works!):

import Data.Array (listArray, bounds, (!))

findIndex arr p
  | arr!end' > p = end' + 1
  | otherwise = go start' end'
  where
    (start', end') = bounds arr

    go start end =
      let mid = (start + end) `div` 2
          midValue = arr ! mid
      in
        if midValue == p then mid
        else if mid == start then (if midValue < p then start else end)
        else if midValue < p then go start mid
        else go mid end

solution :: [[Int]] -> [Int]
solution [rankings, points] = map (\p -> findIndex duplicateRemovedRatings p + 1) points
                            where duplicateRemovedRatings = toArr $ betterNub rankings

toArr l = listArray (0, (length l - 1)) l

With this, you get O(log N) for the search itself, making the overall solution O(M * log N). And this seems to be good enough for HackerRank.

(note that I'm adding 1 to the result of findIndex - this is because the exercise requires 1-based index)

Trilateral answered 6/10, 2021 at 13:20 Comment(4)
why all this? the description states "The player's scores, player, are in ascending order." (emphasis mine). we just need to wind forth over the leaderboard rankings list ranked getting it reversed, while also converting it to RLE format, until we get to the first play's score; then as we play more we wind it back by popping head elements off while possibly updating the current head entry. reverse&rle is O(n) and so is toArray. the popping has O(m+n) total cost. am I missing something?Humblebee
Oh, I totally missed the fact that the players are also sorted. Oh well... Still, even O(M * log N) passes all the tests 🤷Trilateral
betterNub = map head . groupAbfarad
Yeah, but I wanted to show the idea and how it would work.Trilateral
K
3

I believe Fyodor's answer is excellent for your first two and a half questions. For the final half, "How can I build a mental model for performance?", I can say that SPJ is an absolute master of writing highly technical papers in a way accessible to the smart but ignorant reader. The implementation book Implementing lazy functional languages on stock hardware is excellent and can serve as the basis of a mental execution model. There is also Okasaki's thesis, Purely functional data structures, which discusses a complementary and significantly higher-level approach to doing asymptotic complexity analyses. (Actually, I read his book, which apparently includes some extra content, so bear that in mind when deciding for yourself about this recommendation.)

Please don't be daunted by their length. I personally found they were actually actively fun to read; and the topic they cover is a big one, not compressible to a short/quick answer.

Khachaturian answered 6/10, 2021 at 16:58 Comment(1)
upvote just for calling my answer excellent. I am super vain :-)Trilateral

© 2022 - 2024 — McMap. All rights reserved.