What are the technical mechanics and operation of declaring variables in 32-bit MASM?
Asked Answered
R

1

0

Using 32 bit MASM assembly with the MASM version 11 SDK, I discovered an error during compiling. The error pointed to the line where I declared a variable with a double-word (dd) size. The message said the variable was too small for the string I tried to assign to it. When I defined my variable as a byte instead (db) the program was compiled with no error. This implied that declaring a variable with the db instruction could allow more storage than declaring a double-data size. Below is the code for the declaration of a double-word variable that the error message pointed to:

.data
msg_run dd "Ran a function.", 0

I changed the data size of msg_run to a byte:

.data
msg_run db "Ran a function.", 0

When I tried to compile with the second line, the program compiled and ran with no problems. Why did the error imply that a variable declared to be byte-sized has more capacity than a variable declared to be double-word-sized? Does the trailing " ,0" have any effect?

Sources I reviewed:

https://www.cs.virginia.edu/~evans/cs216/guides/x86.html https://www.shsu.edu/~csc_tjm/fall2003/cs272/intro_to_asm.html

Respecting answered 1/8, 2019 at 10:48 Comment(3)
A "string" is really just an array of characters terminated by a zero. Each character is a single byte (for narrow characters, char in C). With dd you make each element of the array a double word, i.e. each element is 32 bits, which isn't really correct.Disincentive
MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte (dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use dd and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order.Santana
Related: When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? / How are dw and dd different from db directives for strings? (NASM and MASM are very different.)Pesthouse
M
2

Having a strict data definition syntax that requires the programmer to write each element separated by a comma would make declaring a string tedious:

myString db 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', 0

so MASM (and all other mainstream assemblers) relaxes the syntax in

myString db "My string", 0

Note that I used quotes ' for characters (i.e. numbers) and double quotes " for strings, I don't know the exact syntax used by MASM and it will possibly convert 1-char string to char.

What you saw with the dd case looks very similar to the shorthand above but it is not a syntax to declare strings, in fact, it creates numbers.

When a string like "ABCD" is used where a number is expected (like in a dd or as an immediate) MASM converts it to 0x44434241. These are the value of the characters D, C, B, A.
The reversing is done because the syntax is mostly used for instruction immediates, like in mov eax, "ABCD" or cmp eax, "ABCD".
This way, storing eax to memory will create the string "ABCD" (in the correct order) thanks to the x86 endianness.
This also works great with checking the signatures of tables since these signatures are designed to spell correctly in memory but, of course, reversed once loaded in a register.

In NASM you can even piss everybody off with things like mov eax, ("ABCD" + "EFGH") / 2, reinforcing the view of these strings as numbers. This should also apply to MASM.

I don't remember a case where I've used myVar dd "ABCD" but it may be useful when a structure has a fixed string that is spelled reversed in memory.


Michael Petch recapped MASM behaviour in a comment:

MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte ( dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use dd and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order.

Magnanimous answered 1/8, 2019 at 10:48 Comment(11)
I think you mean myString db "My string", 0 instead of myString "My string", 0Santana
Yes, thank you @MichaelPetch. It even took me some time to see the difference in the strings in your comment :DMagnanimous
Are these statements correct? (1) Defining dd makes a 32-bit variable and sets unused bits to 0; (2) Defining a variable using db places each letter's value in 1 byte each. MASM creates one byte for every letter.Respecting
@MichaelPetch Does defining using msg dd "tst" give msg 3 32-bit values i.e. double-words, with each double-word holding a character, zero-filled unused registers and 0 as a terminator? How much space can each variable hold?Respecting
@JoachimRives : No since you aren't using db MASM will attempt to store tst into the dword but it will store the characters little endian (backwards). If you define msg as msg dd "abc" it should emit the bytes in reverse order bca instead of abc . I recommend not emitting strings with anything but db as there are very few reasons to do so.Santana
@PeterCordes : Mixing in NASM in this answer only confuses things more IMHO. It is difficult enought to understand MASM let alone tossing other assemblers in here. The question was specifically targeting MASM and IMHO, it might be better to keep it that way.Santana
@MichaelPetch: Agreed. Deleted my comment. I'll just say that NASM is very different in how it treats strings in dd, and in byte-order. Future readers should look it up if they're looking at NASM instead of MASM code.Pesthouse
@MargaretBloom I saw this comment on my question: "MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte (dw and dd)." Is that correct? If so, could you add it to your answer?Respecting
@JoachimRives Yes, it's correct. No problem, I'm adding it (with references), in its full length :)Magnanimous
"When a string like "ABCD" is used where a number is expected (like in a dd or as an immediate) MASM converts it to 0x44434241." I think MASM actually uses the first string character for the high-order byte, ie the result is equal to 41424344h. RBIL lists signatures that way.Toritorie
@Toritorie Uhm, that would be a bit odd but possible. I don't have a MASM at hand right now, feel free to edit as this is a community wiki answer. :) As soon as I find a spare hour I'll check it. I though MASM would use the sensible choice with strings as immediates but I may have guessed wrong.Magnanimous

© 2022 - 2024 — McMap. All rights reserved.