This is high-level answer to explain some terms.
Part 1 - about integer numbers and their encoding in computer
Integer value is integer value, in math it's purely abstract thing. Number "5" is not what you see on the monitor (that's digit 5 (graphical image or "glyph") representing value 5 in base-10 (decimal) format for humans (and some trained animals) who can recognize that glyph pattern; the value 5 itself is purely abstract).
When you use int
in C++, it's not completely abstract, it's lot more hard-wired into the metal. It's 32 bit (on most of the platforms) integer value.
But still that abstract description is much closer to truth, than imagining it as human decimal format of it.
int a = 12345; // decimal number
Here a
contains value 12345, not the format. It's not aware it was entered as decimal string in the source code.
int a = 0x3039; // hexadecimal number
will compile into the exactly same machine code, for CPU it's the same thing, still (a == 12345)
. And finally:
int a = 0b0011000000111001; // binary number
is again the same thing. It's still the same 12345
value, just written in different formatting.
The last binary form is closest to what CPU is using to store the value. It is stored in 32 bits (low/high voltage cells/wires), so if you would measure voltage on particular cell/wire, you would see the "0" voltage level on top 18 bits, then 2 bits with "1" voltage level, and then the rest like in that binary format above... With two least significant bits being "0" and "1".
Also most of CPU circuitry is not aware of particular value of particular bit, that's again "interpretation" of that 0/1, done by the code. Many CPU algorithms like add
or sub
work "from right to left" over all bits, not being aware that currently processed bit is representing in final integer value for example 213 value (that's the 14th least significant bit).
It's upon taking those bits, and calculating string with decimal/hexadecimal/binary representation of those bit values, when you give those "1"s their value. So then it becomes text "12345"
.
If you treat those 32 bits in different way, like representation of ON/OFF LED lights for a LED display panel, then so it will be, once you send it from CPU to the display, the LED display panel will turn on corresponding LED lights, not caring that those bits form also 12345
value when treated as int
.
Only very few CPU instructions work in a way, where they need to be aware of particular value of particular bit.
Part 2 - about input, output and arguments of C/C++ functions
You want to "convert decimal integer (input) to binary."
So let's reason what is input and what is output. Input is taken from std::cin
, so the user will enter string.
Yet if you will do:
int inputNum;
std::cin >> inputNum;
You will end with already converted integer value (32 bits, see above) (or invalid std::cin
state, when user will not enter correct number, probably not your task to handle this).
If you have the number in int
, the binary conversion was already done by the clib
, when it was encoding user input string as 32 bit integer.
Now you can create asm function with C prototype:
void formatToBinary(uint16_t value, char result[17]);
That means you will give it uint16_t
(unsigned 16 bit) integer value, and pointer to 17 reserved bytes in memory, where you will write '0'
and '1'
ASCII characters, and terminate it by another 0
value (for rough description of this one follow my first link in comments under your question).
If you must take input as string, ie.
char str[17];
std::cin > str;
Then you will have in str
(after "12345" input) bytes with values: '1'
(49 in decimal), '2'
, '3'
, '4'
, '5'
, 0
. (Note the last one is zero, NOT ASCII digit '0'
= value 48
).
You will need first to convert these ASCII bytes into integer value (in C++ atoi
may help, or one of few other functions for conversions/formatting). In ASM check SO for questions "how to enter integer".
Once you will convert it to integer value, you can proceed the same way as described a bit above (at that moment it's already encoded in 16 or 32 bits, so outputting string representation of it should be easy).
You may still run into some tricky parts, like if you don't want to output leading zeroes, etc... but all of that should be easy, if you understand how this works.
In this case your ASM function prototype may be only void convertToBinary(char*);
to reuse the string pointer both as input, and output.
Your int intToBin(char*);
looks weird, because it means the ASM will return int
.. but why? That's integer value, not bonded into any particular formatting, so it's binary/octal/decimal/hexa at the same time. Depends how you display it. So you don't need it, you need only the string representing the value in binary form, that's that char *
. And you don't give it the number you entered (unless it's taking it from the string).
From the task description and your skill level I think you are allowed to convert the input into int
right in C++ (ie. std::cin >> int_variable;
).
BTW, if you fully understand what is happening to values in computer, and how CPU instruction work over them, you can often come with many different ways how to achieve some result. For example Jose's conversion to binary is written in simple way how an Assembly newcomer would write it (he wrote it like that to make it easier for you to understand):
mov eax, num // ◄■■ THE NUMBER.
lea edi, bin // ◄■■ POINT TO VARIABLE "BIN".
mov ecx, 32 // ◄■■ NUMBER IS 32 BITS.
conversion:
shl eax, 1 // ◄■■ GET LEFTMOST BIT.
jc bit1 // ◄■■ IF EXTRACTED BIT == 1
mov [edi], '0'
jmp skip
bit1:
mov [edi], '1'
skip :
inc edi // ◄■■ NEXT POSITION IN "BIN".
loop conversion
It's still a bit fragile, for example he initializes "bin" in such way, that it contains 32 spaces and 33th value is zero (null terminator of C string). Then in code he does modify exactly 32 bytes, so the 33th zero is still there and working. If you would adjust his code to skip leading zeroes, it would "break" by displaying remaining part of buffer, as he doesn't set null terminator explicitly.
This is common way how to code in Assembly for performance, to be exactly aware of everything happening, and not setting values which are already set/etc. While you are learning, I would suggest you to work in "defensive" way, rather doing some wasteful things, which will work as safety net in case of some mistake, so I would add mov byte ptr [edi],0
after loop
to set terminator explicitly again.
But it is actually not very fast, as it is using branching. CPU doesn't like that, decoding new instructions is a costly task, and if it is not sure, which instructions will be executed, it simply decodes ahead one path, and in case of wrong guess, it will throw it out, and decode the correct path, but that means several cycles pause in execution, until first instruction of new path is fully decoded and ready for execution.
So when coding for performance, you want to avoid hard-to-predict branches (the final loop
is easy to predict for CPU, as it always loops, only until final exit after ecx is 0
). One of many possible ways in this case can be:
mov edx, num
lea edi, bin
mov ah,'0'/2 // for fast init of al later
// '0' is 48 (even), '0'/2 will work (24)
mov ecx, 32 // countdown counter
conversion:
mov al,ah // al = '0'/2
shl edx, 1 // most significant bit into CF
adc al,al // al = '0'/2 + '0'/2 + CF = '0' or '1'
stosb // store the '0' or '1' to [edi++]
dec ecx // manually written "loop"
jnz conversion // (it is faster on modern CPUs)
mov [edi],ch // explicit set of null-terminator
// (ch == 0, because here ecx == 0)
As you can see, now there is no branching except the loop, CPU branch prediction will handle this much more smoothly, and the performance will be considerably better.
A dword variant for discussion with Cody (NASM syntax, 32b target):
; .data
binNumber times 36 db 0
; .text
numberToBin:
mov edx,0x12345678
lea edi,[binNumber]
mov ecx, 32/4 ; countdown counter
n2b_conversion:
mov eax,0b11000000110000001100000011000
; ^ will become '0'/'1' for each of four bits
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
shl edx,1
rcr eax,8
; here was "or eax,'0000'" => no more needed.
stosd
dec ecx
jnz n2b_conversion
mov [edi],dl ; null terminator
ret
Didn't profile it, just verified it return correct result.
unsigned short
oruint16_t
, then translate to assembler. – Sequenceatoi
, then send the number to assembly to convert to binary. – Rewardedi
to any value, and also you use only 16b ofedi
to address memory while you are in 32b mode, so it very likely tried to modify memory protected by OS. Also you read something from[ebp+12]
, while your function has only one argument, etc... But already your C++ code is wrong, so maybe you should try first to write it in C++ whole, and make sure you understand what is what and why. Then get back to asm, slowly adding parts of it, debugging each of them to verify it works as expected (there are many bugs in that asm) – Monge