Nobody said the pixels were square.
In fact no common video adapter/monitor combination for IBM-style PCs (I don't know if the Macintosh or the Amiga did) rendered square pixels until VGA came along and provided 640x480.
Remember that VGA is an analog technology designed for CRT's. The resolution and refresh rate in particular were controlled by the video adapter on the motherboard, not the monitor. The monitor electronics could run the electron beam left-to-right and up-to-down (within the range of frequencies the electronics could tolerate) to produce any pixel resolution the video controller card wanted to output. Pixels were addressable elements, not display elements, and how 'rectangular' they were depended on the monitor (most monitors had controls to adjust the V-size of the display, so the proportion was not fixed).
Although there were "standard" resolutions (which basically means they were explicitly listed and supported by the IBM PC BIOS), if you knew which hardware you were using you could potentially reprogram the video adapter to render unusual resolutions. Several video games did just that.
I also remember a utility that reprogrammed the Monochrome Display Adapter of the original IBM PC ("MDA") to render 26 lines in text mode, instead of 25. The utility used the extra line to output a 'status bar' if you will, with the Caps Lock, Scroll Lock and Num Lock states (keyboards at the time didn't have status lights). The funny thing is that the MDA adapter didn't have enough RAM for a whole extra line of text, so the video output circuitry rolled-over at the end of the buffer, and the last third or so of the 26th line repeated the first few characters from the top-left of the screen. You lived with that. (The utility also worked on Hercules video adapters, which used the same controller chip but had more RAM to support the graphic modes, so there was no overflow with repeated characters).
You haven't heard of 720x400 because nobody cared. Programs cannot address the individual pixels in this mode because they are generated on the fly by the character generator circuitry, so it really didn't matter. The VGA adapter circuitry could obviously drive the monitor at this frequency, so in theory you could have had a graphic mode at that high resolution, but the adapter didn't have enough RAM to support it (other limits in the electronics could also have been present; for example, could the video RAM be scanned fast enough? I don't know.)
Edit, with some clarifications:
The V-size adjustment knobs were needed because the CRT electronics could not affordably be made exactly precise. Some wiggle room and manual tuning on the voltage that controlled the vertical scan range was needed, to account for voltage variations on the wall current, electronics aging (particularly capacitors), etc. The intent was to use the knobs to adjust the resulting height until the output was "right" (roughly 4x3). The knobs eventually disappeared when the electronics became sophisticated enough to make these adjustments automatically.
To answer your last question: you have to acknowledge the fact that the pixels were not square. A faithful look would require you to scale (compress or stretch) the rendered image 720x400 matrix to a 4x3 ratio at the resolition of your choice. it won't be razor-sharp I-can-count-the-pixels; it cannot be. It's the same problem laptop makers have rendering text mode on an LCD panel.