How is Leetcode able to compile a C++ program without me writing a 'main()' function?
Asked Answered
A

4

29

As far as I know, a C++ compiler looks for a function called main() in order to know where the program should start.

But when I use the Leetcode site, the expected solution's code has no main function. For example:

class Solution {
public:
    int foo(string s) {
        int ans = 0;
        // <work>
        return ans;
    }
 };

It seems like the program starts from whatever function is first in the public section of the Solution class. How did Leetcode's compiler decide that?

Andromede answered 19/6 at 14:5 Comment(1)
This question is being discussed on Meta. Please do not discuss it here.Philbo
D
41

Every C++ program requires a int main() or int main(int argc, char** argv) function*. Without main() you can't produce executable program that can be run.

What does Leetcode do then? It compiles the code you provide together with a main() function and test cases, which you don't see. It then executes that program, analyses test results and provides you with score. Your code is only a part of the entire program that Leetcode compiles.


* An exception is compiling for platforms without operating system, e.g. tiny embedded CPU that can't even hold proper OS. Is it possible to write a program without using main() function?

Dissociate answered 19/6 at 14:17 Comment(7)
note that freestanding environment is common on embedded platforms without OS, it doesn't mean it's only supported on non-OS baremetals. As already seen in your linked question, it's entirely possible to write code without main() for any major OSes and get it run normallyPolicy
@Lundin it’s still rare for most developers to write such programsJuggle
Every microcontroller C or C++ program I'm aware of still has a main() function, including Arduino, AVR, PIC32, STM32, etc., all the way down to the 8-bit AVR chips I've got which have something like 2 KB of flash and 64 bytes of RAM.Ascanius
@GabrielStaples no matter whether you have an OS or not, there has to be a way to make the processor start at the beginning of your program. The well established convention of calling the main function is the best way to do it, no matter what the mechanics behind it.Framing
@DidierL It's one specialized branch of programming so you either write such programs all the times or you never write them. Just as a front end app developer is unlikely to ever write Linux kernel drivers. The point I'm making is: just because you don't know about it personally, that doesn't mean it doesn't exist.Out
@GabrielStaples They mostly do since it's familiar to C programmers. If you want to make a conforming C implementation, then stuff like static storage duration variable initialization has to be done before main (or equivalent name) is called. The normal microcontroller setup is that you have an ISR called reset or some such, which calls the CRT that sets up all the memory and initialization, then it calls main. Debuggers tend to fake that neither the reset ISR nor the CRT was executed and slip in a breakpoint on top of main, so that it seems that's the point where the program starts.Out
@MarkRansom See the comment above. There's traditionally two flavours of microcontrollers: those who start executing code from a fixed address always, or those who load a start vector from a fixed address and then execute from there. In either case, the function called at program startup is not main but something else. And then the CRT is executed from there. The convention of eventually calling something called main later on is just convenience/tradition. It's rarely ever the first function called even on hosted systems. Rather, it's the first function called in the application layer.Out
A
24

Some online coding platforms like Leetcode, GFG doesn't expose their main functions anymore. This was mostly done because the main function becomes very repetitive for every question. Also, people used to optimize input output, cache test cases for getting better execution time in contests, etc.

Now back to the question, Leetcode just inserts your code into their driver code. If you create another main function the compiler will complain about a previous declaration.

enter image description here

As you can see the actual main function starts from the line 21 in this example.

With some python magic you can actually see the entire code. (I am no c++ person, so couldn't get this done using c++).

Here's the code to fetch the current file content.
(Note this is the two sums problem)

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        return []


def print_current_file(): 
    current_file_path = __file__ 
    with open(current_file_path, 'r') as file:
        file_contents = file.read()
    
    print(file_contents) 
print_current_file() 

Output:

# coding: utf-8
from string import *
from re import *
from datetime import *
from collections import *
from heapq import *
from bisect import *
from copy import *
from math import *
from random import *
from statistics import *
from itertools import *
from functools import *
from operator import *
from io import *
from sys import *
from json import *
from builtins import *

import string
import re
import datetime
import collections
import heapq
import bisect
import copy
import math
import random
import statistics
import itertools
import functools
import operator
import io
import sys
import json

import precompiled.__settings__
from precompiled.__deserializer__ import __Deserializer__
from precompiled.__deserializer__ import DeserializeError
from precompiled.__serializer__ import __Serializer__
from precompiled.__utils__ import __Utils__
from precompiled.listnode import ListNode
from precompiled.nestedinteger import NestedInteger
from precompiled.treenode import TreeNode

from typing import *

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

# user submitted code insert below
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        return []


def print_current_file(): 
    current_file_path = __file__ 
    with open(current_file_path, 'r') as file:
        file_contents = file.read()
    
    print(file_contents) 
print_current_file() 

import sys
import os
import ujson as json

def _driver():

    des = __Deserializer__()
    ser = __Serializer__()
    SEPARATOR = "\x1b\x09\x1d"
    f = open("user.out", "wb", 0)
    lines = __Utils__().read_lines()

    while True:
        line = next(lines, None)
        if line == None:
            break
        param_1 = des._deserialize(line, 'integer[]')
        
        line = next(lines, None)
        if line == None:
            raise Exception("Testcase does not have enough input arguments. Expected argument 'target'")
        param_2 = des._deserialize(line, 'integer')
        
        ret = Solution().twoSum(param_1, param_2)
        try:
            out = ser._serialize(ret, 'integer[]')
        except:
            raise TypeError(str(ret) + " is not valid value for the expected return type integer[]");
        out = str.encode(out + '\n')
        f.write(out)
        sys.stdout.write(SEPARATOR)


if __name__ == '__main__':
    _driver()

As you can see the user code gets inserted after the # user submitted code insert below line.

Astoria answered 20/6 at 21:6 Comment(2)
Worth noting that this is pretty much exactly the TOS violation being discussed here, and one thing made brutally clear on a number of occasions is that the US justice system doesn't care how open to abuse the developers made the system.Dozy
@Dozy I believe this gets a pass considering the driver code has been previously discussed on Leetcodes official forum and no action has been taken against that, see - leetcode.com/discuss/general-discussion/1268235/…Astoria
U
10

In general, a C++ program requires a main function (at least- I believe- when running in a hosted environment). Presumably, there's just a main function that you aren't seeing that is using the code you wrote.

There are multiple ways this can be achieved.

  • The C++ model allows for compiling multiple translation units (such as one containing your submission code, and another containing a main function that uses stuff defined in your submission code) and then linking their object code together to produce a program. If you want to learn more about the structure of C++ programs and compiling and linking, I suggest Cppcon's "Back to Basics" videos. Ex. Compiling and Linking and The Structure of a Program.
  • C++ has its own textual inclusion/pasting features- namely, #include, which could be used to include a file containing your submission code "into" another file.
  • Similar textual inclusion/pasting can be done by many other means without using #include.
  • Building your thing into a library, and then building a separate executable that links with the library.
  • ^This is probably not an exhaustive list. Ex. IIRC, HackerRank just uses IDE code folding features to fold out everything other than the submission code section.

The higher level idea is that leetcode gives you an interface it wants you to implement, and then it finds some way to depend on your implementation of that interface. At least in the case of C++, the names of the class and member function are parts of that interface.

Leetcode seems to take the third option. Julkar9 had a nice idea in their answer post to snoop around the filesystem of the execution environment. I just did essentially the same thing they did but for C++. The submission code is stashed to a file named "/mnt/prog.cpp", and the "full" code that the submission code is included into is stored in a file named "/mnt/prog_joined.cpp". I'm not sure whether this applies for all challenges/problems, or just for the one I did, and this may change in the future.

The code if you're interested:

#include <iostream>
#include <fstream>
#include <filesystem>
namespace fs = std::filesystem;
class Solution {
public:
  vector<int> twoSum(vector<int>& nums, int target) {
    std::cout
      << "\nclang: v" << __clang_major__ << '.' << __clang_minor__ << '.' << __clang_patchlevel__
      << "\ncurrent_file_and_line: " << __FILE__ << ':' << __LINE__
      << "\npwd: " << fs::current_path()
      ;
    // const std::array roots {"/mnt", "/leetcode"};
    // for (const auto& root : roots) {
    //   for (const auto& dir_entry : fs::recursive_directory_iterator{root, fs::directory_options::skip_permission_denied}) {
    //     std::cout << '\n' << dir_entry;
    //   }
    // }
    const std::array files {
      // "/mnt/compile.err",
      "/mnt/prog.cpp", // just submission code
      "/mnt/prog_joined.cpp", // submission code included into "wrapper" code
      // "/mnt/prog" // the built program
      // "/.dockerenv",
      // "/mnt/data.1.in",
      // "/mnt/judge.out",
      // "/leetcode/data", ??
    };
    for (const auto& file_path : files) {
      std::cout << "\n\n\ncontents of the file " << file_path;
      std::ifstream file {file_path};
      if (file.is_open()) { std::cout <<'\n'<< file.rdbuf(); }
      else { std::cout << "\nfailed to print file " << file_path; }
    }
    return {};
  }
};

Which showed that the wrapper (for the problem I ran the above code "on"/in) is like this (at the time that I ran it):

#include "precompiled/headers.h"

using namespace std;

// user submitted code insert below
_Deserializer_ _des_;
_Serializer_ _ser_;

// <contents of prog.cpp>

class __DriverSolution__ {
public:
  vector<int> __helper__(
    vector<int>& param_1, int param_2) {
      vector<int> ret = Solution().twoSum(param_1, param_2); return ret;
  }
};

int main(int argc, char *argv[]) {
    char SEPARATOR[] = "\x1b" "\x09" "\x1d";
    setbuf(stdout, NULL);
    ofstream fout("user.out");
    string line;
    while (getline(cin, line)) {
        
        vector<int> param_1 = _des_.deserialize<vector<int>>(line);
        getline(cin, line);
        int param_2 = _des_.deserialize<int>(line);
        
        vector<int> ret = __DriverSolution__().__helper__(
          param_1, param_2
        );

        
        string out = _ser_.serialize(ret);
        

        fout << out << endl;
        cout << SEPARATOR;
    }
    return 0;
}
#pragma GCC optimize ("O2")

Side commentary: Leetcode's usage of identifiers containing __ is not amazing. I'm pretty sure identifiers like that are reserved for implementations, and that programs that do that are ill-formed-NDR (see cppref)- at least on paper.

As for how exactly this non-#include text inclusion is done, I tried looking to see if there was some script in the execution environment doing it by grepping for prog_joined in the execution environment, but had difficulty due to execution timeouts, and didn't find anything in the places I'd expect it to be if it existed. I don't know a lot of Docker, but I'm going to guess that /mnt is a bind mount to a host system or something, and that the host system contains the stuff that does that, or that communicates with some other system that does that. This is speculation, but at this point, there's not much of practical nature to gain from knowing the answer to this final piece of the puzzle.

On the practical applications of what we do know, Julkar9 mentioned that leetcode apparently doesn't let you define your own main to prevent you from optimizing it. I don't know if that claim of rationale is true or not, but fun fact- now that you know how this works, you can mess with it:

class Solution {
public:
  int foo() {/*...*/}
};
int main() {
  // whatever the heck you want. Ex. optimize input parsing
  return 0;
}
#define main I_REJECT_YOUR_ENTRYPOINT_AND_SUBSTITUTE_MY_OWN

^That's something you couldn't do in the confines of leetcode if leetcode was building through separate compilation and link. If you weren't limited by leetcode's interface, there are plenty of other things you could do, such as probably some sort of tinkering with linker scripts (but I don't know. I don't write linker scripts).

Undrape answered 21/6 at 0:32 Comment(4)
"'m pretty sure identifiers like that are reserved for implementations, and that programs that do that are ill-formed-NDR ", AFAIK, in C++17, this is UB per 20.5.4.3 Reserved names [reserved names] [...] 2 If a program declares or defines a name in a context where it is reserved, other than as explicitly allowed by this Clause, its behavior is undefined.. It's certainly UB in other versions too, but I don't have other versions of the Standard near me right now (so the section number may change). Not sure which the "leest" worse :')Fairground
Forcing everyone to pull in the names from std namespace by using namespace std in the boilerplate code is really bad.Alcus
@WeijunZhou Agreed, but I think this is the last place, particularly the last language community, that would accuse LeetCode of promoting good practices.Dozy
I'm not really clear on what benefit there is from an argument about LeetCode's use of __ identifiers. In some sense, it is the implementation, and in any event, that's something that should be transparent to the user when they're not snooping/reverse engineering the platform.Pansophy
V
0

First, a C++ code without a main function compiles just fine. It doesn't link as a stand alone executable, because a stand alone executable requires an entry point, which is the main function. However it can link as a shard object / dynamic load library or as a static linlibrary.

Second, they can append a main function at the end of your code, or compile a multi-file project with a generated main function. The generated main function will feed the input data to the code, will ensure the code does not exceed the allotted time and will check the output for correctness.

Verena answered 23/6 at 13:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.