Why is this OCaml program faster than my C program?
Asked Answered
S

5

12

I wrote a basic Hippity Hop program in C, Python, and OCaml. Granted, this is probably not a very good benchmark of these three languages. But the results I got were something like this:

  • Python: .350 seconds
  • C: .050 seconds
  • interpreted OCaml: .040 seconds
  • compiled OCaml: .010

The python performance doesn't really surprise me, but I'm rather shocked at how fast the OCaml is (especially the interpreted version). For comparison, I'll post the C version and the OCaml version.

C

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

long get_count(char *name);

int main(int argc, char *argv[])
{
  if (argc != 2){
    printf("Filename must be specified as a positional argument.\n");
    exit(EXIT_FAILURE);
  }

  long count_no = get_count(argv[1]);

  int i;
  for (i = 1; i <= count_no; i++){
    if (((i % 3) == 0) && ((i % 5) == 0)){
      printf("Hop\n");
      continue;
    }
    if ((i % 3) == 0){
      printf("Hoppity\n");
    }
    if ((i % 5) == 0){
      printf("Hophop\n");
    }
  }
  return 0;
}

long get_count(char *name){
  FILE *fileptr = fopen(name, "r");
  if (!fileptr){
    printf("Unable to open file %s.\n", name);
    exit(EXIT_FAILURE);
  }
  size_t text_len = 20;
  char *file_text = calloc(text_len, sizeof(char));
  while (!feof(fileptr)){
    fread(file_text, sizeof(char), text_len, fileptr);
    assert(!ferror(fileptr));
    text_len += 20;
    file_text = realloc(file_text, text_len * sizeof(char));
  }
  long file_as_int = strtol(file_text, NULL, 10);

  free(file_text);
  return file_as_int;
}

OCaml

open String;;

let trim str =
  if str = "" then "" else
  let search_pos init p next =
    let rec search i =
      if p i then raise(Failure "empty") else
      match str.[i] with
      | ' ' | '\n' | '\r' | '\t' -> search (next i)
      | _ -> i
    in
    search init
  in
  let len = String.length str in
  try
    let left = search_pos 0 (fun i -> i >= len) (succ)
    and right = search_pos (len - 1) (fun i -> i < 0) (pred)
    in
    String.sub str left (right - left + 1)
  with
  | Failure "empty" -> ""
;;

let rec iterate_over_numbers curr_num max_num =
  (
   if curr_num <= max_num then (
     if ((curr_num mod 3) == 0) && ((curr_num mod 5) == 0) then 
       print_endline "Hop"
     else if (curr_num mod 3) == 0 then 
       print_endline "Hoppity"
     else if (curr_num mod 5) == 0 then
       print_endline "Hophop";
     iterate_over_numbers (curr_num + 1) max_num
   ))
;;


let fname = Sys.argv.(1);;
let infile = open_in fname;;
let file_text = trim (input_line infile);;
close_in infile;;
let input_number = int_of_string file_text;;
iterate_over_numbers 1 input_number;;

But I'm curious to know why I'm getting these results. Am I doing something dumb in my C program, or is this just something OCaml is faster at? It seems to me a bit strange that an interpreted program is running a little faster than the C version, and the compiled program is running 5 times as fast.

Souterrain answered 19/10, 2009 at 0:17 Comment(8)
What compiler switches did you use for the C version?Malvin
@Jonathan - I just did a gcc -o hoppity main.c. I didn't think about setting an optimization level to be honest. :-)Souterrain
What was the number in the file when the test was run? And was the output redirected to a file or a pipe the same for all of the tests? And it would be interesting to know the relative performance with 'gcc' and 'gcc -O'.Miun
Also, FWIW, you leak an open file by not using 'fclose()' in the function where you use 'fopen()'. Fixing that will slow things down, of course, but what you open, you should close; what you allocate, you should free.Miun
It's (probably) not really related to your question, but while (!feof(fileptr)){ is pretty much a guaranteed bug.Dobbin
Note that this is a minor variant on the FizzBuzz problem: #937. However, it is not a duplicate - this question is asking about the timings; the other is asking about implementations.Miun
@Jonathan - I'll see how many questions I can answer. The output was just printed to stdout. I didn't redirect it. And the performance that I got when turning on optimizations was roughly comparable to the OCaml performance. And thanks for pointing out that I forgot to fclose the open file. Does that make a difference since the program was ending though?Souterrain
@Jerry - I started a separate question for that point: https://mcmap.net/q/1007930/-why-is-this-c-code-buggySouterrain
M
10

Your C code isn't the equivalent of the OCaml code - you used 'else if' in the OCaml to avoid having to recompute moduli quite so much.

There's an awful lot of code in that 'read the long integer'. Why not just use fscanf(); it skips blanks and all that automatically, and avoids you doing the malloc() etc. I don't often recommend using fscanf(), but this looks like a setup for it - a single line, possibly with spaces either side, no funny stuff.


Curiosity killed the cat - but in this case, not the Leopard.

I downloaded OCaml 3.11.1 for MacOS X Intel and copied the OCaml code from the question into xxx.ml (OCaml), and compiled that into an object file xxx (using "ocamlc -o xxx xxx.ml"); I copied the C code verbatim into yyy.c and created a variant zzz.c using fscanf() and fclose(), and compiled them using "gcc -O -o yyy yyy.c" and "gcc -O -o zzz zzz.c". I created a file 'file3' containing: "987654" plus a newline. I created a shell script runthem.sh as shown. Note that 'time' is a pesky command that thinks its output must go to stderr even when you'd rather it didn't - you have to work quite hard to get the output where you want it to go. (The range command generates numbers in the given range, inclusive - hence 11 values per program.)

Osiris JL: cat runthem.sh
for prog in "ocaml xxx.ml" ./xxx ./yyy ./zzz
do
    for iter in $(range 0 10)
    do
        r=$(sh -c "time $prog file3 >/dev/null" 2>&1)
        echo $prog: $r
    done
done
Osiris JL: 

I ran all this on a modern MacBook Pro (3 GHz Core 2 Duo etc, 4GB RAM) running Leopard (10.5.8). The timings I got are shown by:

Osiris JL: sh runthem.sh
ocaml xxx.ml: real 0m0.961s user 0m0.524s sys 0m0.432s
ocaml xxx.ml: real 0m0.953s user 0m0.516s sys 0m0.430s
ocaml xxx.ml: real 0m0.959s user 0m0.517s sys 0m0.431s
ocaml xxx.ml: real 0m0.951s user 0m0.517s sys 0m0.430s
ocaml xxx.ml: real 0m0.952s user 0m0.516s sys 0m0.431s
ocaml xxx.ml: real 0m0.952s user 0m0.514s sys 0m0.431s
ocaml xxx.ml: real 0m0.951s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.959s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.950s user 0m0.515s sys 0m0.431s
ocaml xxx.ml: real 0m0.956s user 0m0.516s sys 0m0.431s
ocaml xxx.ml: real 0m0.952s user 0m0.514s sys 0m0.432s
./xxx: real 0m0.928s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.938s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.927s user 0m0.494s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.492s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.493s sys 0m0.430s
./xxx: real 0m0.927s user 0m0.493s sys 0m0.430s
./xxx: real 0m0.928s user 0m0.492s sys 0m0.430s
./xxx: real 0m0.933s user 0m0.497s sys 0m0.428s
./xxx: real 0m0.926s user 0m0.494s sys 0m0.429s
./xxx: real 0m0.921s user 0m0.492s sys 0m0.428s
./xxx: real 0m0.925s user 0m0.494s sys 0m0.428s
./yyy: real 0m0.027s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.031s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.029s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.029s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.031s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./yyy: real 0m0.030s user 0m0.026s sys 0m0.002s
./yyy: real 0m0.028s user 0m0.026s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
./zzz: real 0m0.029s user 0m0.027s sys 0m0.001s
./zzz: real 0m0.030s user 0m0.027s sys 0m0.002s
Osiris JL:

I don't see the OCaml code running faster than the C code. I ran the tests with smaller numbers in the file that was read, and the results were similarly in favour of the C code:

Stop number: 345

ocaml xxx.ml: real 0m0.027s user 0m0.020s sys 0m0.005s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.005s
ocaml xxx.ml: real 0m0.025s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.003s
ocaml xxx.ml: real 0m0.022s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.019s user 0m0.015s sys 0m0.003s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.004s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
ocaml xxx.ml: real 0m0.020s user 0m0.015s sys 0m0.004s
ocaml xxx.ml: real 0m0.021s user 0m0.016s sys 0m0.004s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.002s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.005s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.002s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./xxx: real 0m0.003s user 0m0.001s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.003s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.001s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.002s user 0m0.000s sys 0m0.001s
./yyy: real 0m0.003s user 0m0.000s sys 0m0.002s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.001s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.003s user 0m0.000s sys 0m0.002s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s
./zzz: real 0m0.002s user 0m0.000s sys 0m0.001s

Stop number: 87654

ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.101s user 0m0.060s sys 0m0.040s
ocaml xxx.ml: real 0m0.103s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.101s user 0m0.059s sys 0m0.041s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.103s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.101s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.102s user 0m0.059s sys 0m0.040s
ocaml xxx.ml: real 0m0.105s user 0m0.059s sys 0m0.041s
./xxx: real 0m0.092s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.087s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.086s user 0m0.045s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.085s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.038s
./xxx: real 0m0.084s user 0m0.044s sys 0m0.039s
./xxx: real 0m0.083s user 0m0.044s sys 0m0.038s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.005s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.005s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.004s user 0m0.003s sys 0m0.001s
./yyy: real 0m0.006s user 0m0.003s sys 0m0.002s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.005s user 0m0.003s sys 0m0.002s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.004s user 0m0.003s sys 0m0.001s
./zzz: real 0m0.005s user 0m0.003s sys 0m0.001s

Obviously, YMMV - but it appears that OCaml is slower than C by a considerable margin, but that if the number in the given file is small enough, then start up and file reading dominate the process time.

The C timings, especially at the smaller numbers, are so fast that they are not all that reliable.

Miun answered 19/10, 2009 at 1:26 Comment(2)
FYI, neither one of these seems to have made a huge performance difference. Although using fscanf does make the code a whole lot more readable!Souterrain
Interesting: as I asked in the comments to the main question - how big was the number that you were given? And what happens if you iterate N times (where N is a big enough number to get the fastest program to take a time of the order of seconds)? Don't forget you'll need to close the file if you do 10,000 iterations.Miun
D
9

Time under 0.05 can be a simple noise. Repeat the main program enough times to actually get ~1s of execution time in C. (I mean repeating it in a loop in the program itself, not by running it again)

Did you compile your code with optimisations turned on? Did you try reducing the number of branches? (and comparisons)

if (i % 3 == 0) {
  if (i % 5 == 0) {
    printf("Hop\n");
    continue;
  }
  printf("Hoppity\n");
} else if (i % 5 == 0){
  printf("Hophop\n");
}

Did you try looking at the assembler output?

Also printf is pretty slow. Try puts("Hop") instead, since you don't use the formating anyways.

Dreamy answered 19/10, 2009 at 0:30 Comment(6)
That's possible, but you'd have to test it. The time you save on every 15'th try (you do only 1 check) might not balance the time you lose on the numbers that do 3 checks. Only testing can tell that :) (commented on a suggestion to check i%15, i%3, i%5 - it was deleted later on)Dreamy
Yeah, I deleted it because of a typo in the code, then realized it probably wouldn't be more efficient, so I kept it deleted.Santiagosantillan
Thus far, it seems to be a lot faster if I turn the optimization level up.Souterrain
+1 , unless using a very basic printf() sub system from a diet library, printf() really slows things down , especially with glibc creature comforts.Forklift
You don't need the 'continue' if you use an 'else' to print 'Hoppity'.Miun
@Jonathan - It's difficult to know which would be more "efficient" but I think your version is certainly more readable.Santiagosantillan
D
2

In a tiny program like this, it's often hard to guess why things work out the way they do. I think if I was doing it, I'd write the code like this (leaving out error checking for the moment):

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) { 
    static char buffer[20];   
    int limit, i;

    freopen(argv[1], "r", stdin);
 fgets(buffer, sizeof(buffer), stdin);
    limit = atoi(buffer);

    for (i=1; i<=limit; i++) {
        int div3=i%3==0;
        int div5=i%5==0;
        if (div3 && div5) 
            puts("Hop");
        else if (div3)
            puts("Hoppity");
        else if (div5)
            puts("HopHop");
    }
    return 0;
}

Using freopen avoids creating another file stream, and instead just connects standard input to the specified file. No guarantee that it's faster, but it's unlikely to be any slower anyway.

Likewise, a good compiler may note that i is constant throughout the loop body, and factor out the two remainder operations so it only does each once. Here I've done that by hand, which might not be any faster, but almost certainly won't be any slower.

Using puts instead of printf is fairly similar -- it might not be any faster, but it almost certainly won't be any slower. Using printf the way you have, it has to scan through the entire string looking for a '%', in case you're asking for a conversion, but since puts doesn't do any conversions, it doesn't have to do that.

With such a tiny program as this, there's another factor that could be considerably more significant: puts will usually be considerably smaller than printf. You haven't said how you're doing the timing, but if it includes the time to load the code, really small code may well make more difference than the execution time.

Dobbin answered 19/10, 2009 at 4:35 Comment(0)
C
1

I would be interested to see how much time is spent in get_count().

I'm not sure how much it would matter, but you're reading in a long as a string, which means the string cannot be larger than 20 bytes, or 10 bytes (2^64 = some 20 character long decimal number, or 2^32 = some 10 character long decimal number), so you don't need your while loop in get_count. Also, you could allocate file_text on the stack, rather than calling calloc - but I guess you'd still need to zero it out, or otherwise find the length and set the last byte to null.

file_length = lseek(fileptr, 0, SEEK_END);
Claustral answered 19/10, 2009 at 0:39 Comment(2)
Interesting point. I suppose that reallocating memory over and over again would be pretty inefficient. I suppose I just need to get familiar with the C profiler. :-)Souterrain
Using Jonathan's fscanf suggestion above to a local variable, it doesn't look like this made a huge difference.Souterrain
B
1

Any program which mostly involves opening a file and reading it is limited by the speed of opening a file and reading it. The C computations you are doing here will take between 1 millionth and one thousandth of the time of opening the file and reading it.

I thought this site was useful: http://norvig.com/21-days.html#answers

Bascule answered 19/10, 2009 at 0:53 Comment(3)
If this were true, there wouldn't be such a discrepancy between C and OCaml (since both are bound by disk IO). The OP has indicated that the C code got much faster when he turned the optimization level up, so I suspect that this may be the issue.Santiagosantillan
But the time it takes to access a file is completely random. If the file is still cached it will take much less time than if the program is started cold. If he runs the OCaml first and the C second, the C will probably win. If he runs the C program over and over a million times, the average time will drop very significantly.Bascule
@Kinopiko - I'm actually running these against duplicates of the same file.Souterrain

© 2022 - 2024 — McMap. All rights reserved.