How to access a char array and change lower case letters to upper case, and vice versa
Asked Answered
T

5

2

I'm currently working on a class project for Structured Computer Organization using an x86 processor. The value that I am accessing is an 1 byte char, but I do not know how to compare it to an uppercase. They said to use an ASCII table of the hex format, but I'm not sure how to even compare the two.

void changeCase (char char_array[], int array_size ) {
    __asm {
            // BEGIN YOUR CODE HERE
 
        mov eax, char_array;        //eax is base image
        mov edi, 0;
        
    readArray:
        cmp edi, array_size;
        jge  exit;
        mov ebx, edi;           //using ebx as offset
        shl ebx, 2;
        mov cl, [eax + ebx];    //using ecx to be the storage register
    
    check:
        //working on it
        cmp cl, 0x41;       //check if cl is <= than ASCII value 65 (A)
        jl next_indx;
        cmp cl, 0x7A;       //check if cl is >= than ASCII value 122 (z)
        jg next_indx;
        cmp cl, 'a';
        jl convert_down;
        jge convert_up;
        

    convert_down:
        or cl, 0x20;        //make it lowercase
        jmp write;

    convert_up:
        and cl, 0x20;       //make it uppercase
        jmp write;

    write:
        mov byte ptr [eax + ebx], cl    //slight funky town issue here,

    next_indx:
        inc edi;

    exit:
        cmp edi, array_size;
        jl readArray;

    mov char_array, eax;
            // END YOUR CODE HERE
    }
}

Anything helps at this point. Thank you in advance for the help!

edit 1:

Thanks for all the suggestion and points of clarity, edited my code to reflect change. Some problem with access violation now.

edit 2 (+):

Thanks for the helpful eyes people. I'm still getting to translating all letters now.

Tantara answered 11/3, 2016 at 4:32 Comment(11)
As I understand it, MSVC does the push/pop for you, to save/restores any registers you use. If you look at the disassembly output, your push/pop instructions are probably redundant. Writing your function in asm directly, not inline asm inside a C function, would mean you'd have to understand the calling convention, but you'd have a better idea what's going on once you succeed.Residual
Possible duplicate of X86 NASM Assembly converting lower to upper and upper to lowercase charactersResidual
Hello Peter, thanks for the input. I'm am going to work on a caller/callee function soon. I cannot alter the code outside of the commented lines.Tantara
If you take a look at an ascii table, you should hopefully notice that the range of values for uppercase characters are contiguous and separate from the range of values for lowercase characters. This fact should be helpful.Tinker
The ASCII character set has an interesting property that the upper and lower case A-Z ranges are 0x20 from each other. Also, neither cross a 0x20 alignment boundary, so if a character is a letter to start with, you can make it lowercase by ORing it with 0x20 to set that bit, regardless of whether it was upper or lower before.Residual
Ah, that may be more useful @andars, thank you!Tantara
@PeterCordes suggestion's is even more useful, especially if you are familiar with xor.Tinker
Made some changes, received an access violation. :/Tantara
Your code does mov al, [eax + edi]; AL is the lower 8 bits of the EAX register. You effectively destroy the contents of EAX (and the original pointer) and then at the bottom you do mov [eax + edi], al which now moves the contents of AL into the memory address pointed to by eax+edi. Problem is EAX has been trashed. You might want to consider another register other than AL for doing the character comparison and conversions. Maybe use CL instead of ALProfessorship
@MichaelPetch, dang it. Small blunder, fixing it soon. Thank you!Tantara
You really should learn to step through with the debugger, it would make seeing the final problems easier. Your convert_up and convert_down code isn't correct, and I'm unsure why you trash the array with mov char_array, eax; at the very end (seems that line should just be eliminated).Professorship
H
2

For clarity's sake, I'll just use pure assembly and assume that...

  • char_array is a 32-bit pointer at [ebp+8].
  • array_size is a two's complement 32-bit number at [ebp+12].
  • For your platform (it is this way for most anyway), char's encoding is ASCII.

You should be able to deduce this yourself into inline assembly. Now, if you look at the table everyone is supposed to remember but barely anyone does, you'll notice some important details...

  • Uppercase letters A through Z map into codes 0x41 through 0x5A, respectively.
  • Lowercase letters a through z map into codes 0x61 through 0x7A, respectively.
  • Everything else is not a letter, and thus need no case conversion.
  • If you look at the binary representation of the upper and lowercase letter ranges, you'll notice that they are exactly the same, with the sole exception that uppercase letters have bit 6 cleared, and lowercase ones have it set.

As a result, the algorithm would be...

while array_size != 0
    byte = *char_array
    if byte >= 0x41 and byte <= 0x5A
        *char_array |= 0x20 // Turn it lowercase
    else if byte >= 0x61 and byte <= 0x7A
        *char_array &= 0xDF // Turn it uppercase
    array_size -= 1
    char_array += 1

Now, let's translate this into assembly...

mov eax, [ebp+8]      # char *eax = char_array
mov ecx, [ebp+12]     # int ecx = array_size

.loop:
    or ecx, ecx       # Compare ecx against itself
    jz .end_loop      # If ecx (array_size) is zero, we're done
    mov dl, [eax]     # Otherwise, store the byte at *eax (*char_array) into `char dl`
    cmp dl, 'A'       # Compare dl (*char_array) against 'A' (lower bound of uppercase letters)
    jb .continue      # If dl` (*char_array) is lesser than `A`, continue the loop
    cmp dl, 'Z'       # Compare dl (*char_array) against 'Z' (upper bound of uppercase letters)
    jbe .is_uppercase # If dl (*char_array) is lesser or equal to 'Z', then jump to .is_uppercase
    cmp dl, 'a'       # Compare dl (*char_array) against 'a' (lower bound of lowercase letters)
    jb .continue      # If dl (*char_array) is lesser than 'a', continue the loop
    cmp dl, 'z'       # Compare dl (*char_array) against 'z' (upper bound of lowercase letters)
    jbe .is_lowercase # If dl (*char_array) is lesser or equal to 'z', then jump to .is_lowercase
    jmp .continue     # All tests failed, so continue the loop

    .is_uppercase:
        or dl, 20h    # Set the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .is_lowercase:
        and dl, DFh   # Clear the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .continue:
        inc eax       # Increment `eax` (`char_array`), much of like a pointer increment
        dec ecx       # Decrement `ecx` (`array_size`), so as to match the previous pointer increment
        jmp .loop     # Continue

.end_loop:

Once code reaches .end_loop, you're done.

I hope this has led a light on you!

Hydroponics answered 11/3, 2016 at 5:18 Comment(9)
Thank You! This step by step procedure is just what I needed! I'll take what I've done and upload the results shortly. If only I can, uh, upvote you.Tantara
@archon263: Don't worry if you can't upvote yet ;). If you want to inline that same code, just replace [ebp+8] with [char_array] and [ebp+12] with [array_size].Hydroponics
Thanks! I'm not that comfortable with inline yet, but I will see how to use it after editing my code.Tantara
I don't memorize the ascii table. I just use constants like 'a' or '0' in source code (including asm source code), and run the ascii(1) program to print the table in my terminal if needed. You can replace your magic hex constants with 'A', 'Z', 'a', 'z'. You can also simplify your comparisons: make a copy and or with 0x20, then you only need to check for between 'a' and 'z'. And you can use the unsigned-compare trick to do that range check: subtract 'a', cmp al, 26; ja .non_digit. In the al<'a' case, the sub wraps around, producing a large (unsigned) number.Residual
This would be easier to read if the explanation lines were comments, rather than alternating code and non-code. At least I think so, maybe for total beginners it's easier to read this? It's hard to visually find branch targets the way one normally can, by looking for the rare not-indented lines. Almost all of your explanations would fit easily on a comment line.Residual
@PeterCordes: Thanks for your suggestions. I implemented the second one (I'm used to write numerical constants in assembly for everything), and was aware of the first one, but preferred not to optimize the stuff too much, for the OP's understanding's sake.Hydroponics
Now that it's readable (+1 for that): jmp .continue on the line right before .continue: is a no-op and should be removed. Also, you could reverse .is_lowercase and .is_uppercase, so the last of the four conditional branches can be jnbe .continue, otherwise fall through into .is_lowercase. Also test ecx,ecx is always better than or ecx,ecx, because it can macro-fuse with a jcc.Residual
@PeterCordes: Ha, didn't noticed that redundant jmp. Anyway, thanks for your suggestions, applied them. Please let me know if you have any others! Not sure if taking macro-op fusion into account is just too much of an optimization, though..Hydroponics
@KemyLand: or same,same is not a useful idiom. There's no reason for beginners to even consider it. test same,same instead of cmp reg,0 is also an optimization, so for maximum clarity you should write that. However, it would be better to put the loop condition at the end of the loop, like normal for asm (branch on the flags set by dec ecx). (See my answer or the OP's answer).Residual
R
6

Variations of this question get asked all the time. This version of the problem (requiring conditional behaviour beyond just if(isalpha(c)) c|=0x20;)) made the problem complex enough that it wasn't immediately obvious how to do it efficiently.

It turns out that xor wasn't hard to think of, and converting this code to unconditionally upcase or downcase only requires a simple change from xor 0x20 to and ~0x20 or or 0x20. (Simplifying a bit more is possible, too.)

Here's how I'd do it with an attempt at optimally efficient asm. I even included a version with SIMD vectors, and another version of the byte loop using the branchless idea I got from vectorizing it.

Reading this answer is probably only useful once you understand the basic principles involved in solving this with not-so-optimized code. OTOH, there are very few operations actually needed, so there's not much code to grok. And I did comment it heavily. There are many helpful links in the tag wiki, from tutorials to reference guides to performance tuning.


Converting between lower and upper case alphabetic ASCII characters only requires setting or clearing the 0x20 bit, because the ASCII character set is laid out with the ranges 32 from each other, and not crossing a mod32 boundary.

For each byte:

  • make a copy and unconditionally OR it with 0x20
  • check if it's between 'a' and 'z'
  • if so, flip the ASCII alphabetic case bit using xor and store the result back into the array.

Doing the ASCII isalpha(3) test this way is safe: The only source bytes that end up in the 'a'..'z' range from setting that bit are the upper-case alphabetic characters. It's just math that works for any two equal-sized ranges that don't cross a %32 boundary. (Or a %64 boundary if the relevant bit was 0x40, for example).

To do the compare even more efficiently, I use the unsigned-compare trick so there's only one conditional branch inside the loop (other than the loop condition itself). See the comments in the code for an explanation.

One byte at a time branching on an efficient range-check for alphabetic char detection

/******** Untested. ************/

// ASCII characters are flipped to the opposite case (upper <-> lower)
// non-ASCII characters are left unchanged
void changeCase (char char_array[], int array_size ) {

    __asm{
            // BEGIN YOUR CODE HERE

        mov   esi, char_array;      // MSVC inline asm requires these potentially-redundant copies :(
        mov   ecx, array_size;

        test  ecx,ecx;       // return if(size <= 0)
        jle  early_out;

    next_char:
        movzx eax, byte ptr [esi];     // load the current character
        mov   edx, eax;              // save a copy to maybe flip + store

        // check if the character is alphabetic or not
        // there are two equal-size ranges of characters: one with 0x20 set, and one without
        or    al, 0x20;      // set 0x20 and then just check that lowercase range

        // unsigned compare trick: 0 <= n < high  can be done with one unsigned compare instead of two signed compares
        // low < n < high  can be done by shifting the range first
        sub   al, 'a';       // if al is less than 'a', it will become a large unsigned number
        cmp   al, 'z'-'a';
        ja  non_alpha;      // conditionally skip the flip & store

        xor   dl, 0x20;      // toggle the ASCII case bit
        mov   [esi], dl;
                             // xor [esi], 0x20   // saves the mov earlier, but is otherwise slower
    non_alpha:

        inc   esi;
        dec   ecx;
        jz next_char;

    early_out:
            // END YOUR CODE HERE
    }
}

This code might be more readable if some of the "design doc" stuff was in a block outside the code. It clutters things up a lot, and makes it look like there's a lot of code, but really there are very few instructions. (They're just hard to explain with short comments. Commenting code is tricky: comments that are too obvious are just clutter and take time away from reading the code and the useful comments.)


Vectorized

Actually for x86 I'd use SSE or AVX to do 16B at a time, doing the same algorithm, but doing the comparisons with two pcmpgtb. And of course unconditionally storing the results, so an array of all non-alphabetic characters would still be dirtied in the cache, using more memory bandwidth.

There's no unsigned SSE compare, but we can still range-shift the range we're looking for down to the bottom. There are no values less than -128, so in a signed compare it works the way 0 does in an unsigned compare.

To do this, subtract 128. (or add, or xor (carryless add); there's nowhere for the carry / borrow to go). This can be done in the same operation as subtracting 'a'.

Then use the compare result as a mask to zero out bytes in a vector of 0x20, so only the alphabetic characters get XORed with 0x20. (0 is the identity element for XOR/add/sub, which is often really handy for SIMD conditionals).

See also a strtoupper version that has been tested, and code to call it in a loop, including handling of non-multiple-of-16 inputs, on implicit-length C strings (searching for the terminating 0 on the fly).

#include <immintrin.h>

// Call this function in a loop, with scalar cleanup.  (Not implemented, since it's the same as any other vector loop.)

// Flip the case of all alphabetic ASCII bytes in src
__m128i inline flipcase(__m128i src) {
    // subtract 'a'+128, so the alphabetic characters range from -128 to -128+25 (-128+'z'-'a')
    // note that adding 128 and subtracting 128 are the same thing for 8bit integers.
    // There's nowhere for the carry to go, so it's just xor (carryless add), flipping the high bit

    __m128i lcase = _mm_or_si128(src, _mm_set1_epi8(0x20));

    __m128i rangeshift= _mm_sub_epi8(lcase, _mm_set1_epi8('a'+128));
    __m128i non_alpha = _mm_cmpgt_epi8(rangeshift, _mm_set1_epi8(-128 + 25));  // 0:alphabetic   -1:non-alphabetic

    __m128i flip  = _mm_andnot_si128(non_alpha, _mm_set1_epi8(0x20));       // 0x20:alpha    0:non-alpha

    return          _mm_xor_si128(src, flip);
    // just mask the XOR-mask so non-alphabetic elements are XORed with 0 instead of 0x20
    // XOR's identity value is 0, same as for addition
}

This compiles to nice code, even without AVX, with only one extra movdqa to save a copy of a register. See the godbolt link for two earlier versions (one using two compares to keep it simple, another using pblendvb before I remembered to mask the vector of 0x20s instead of the result.)

flipcase:
        movdqa  xmm2, XMMWORD PTR .LC0[rip]    ; 0x20
        movdqa  xmm1, xmm0
        por     xmm1, xmm2
        psubb   xmm1, XMMWORD PTR .LC1[rip]    ; -31
        pcmpgtb xmm1, XMMWORD PTR .LC2[rip]    ; -103
        pandn   xmm1, xmm2
        pxor    xmm0, xmm1
        ret

section .rodata
    .LC0:   times 16 db  32
    .LC1:   times 16 db  -31
    .LC2:   times 16 db  -103

This same idea of using a branchless test would also work for the byte loop:

        mov   esi, char_array;
        mov   ecx, array_size;

        test  ecx,ecx;       // return if(size <= 0)
        jle  .early_out;

    ALIGN 16   ; really only need align 8 here, since the next 4 instructions are all 2 bytes each (because op  al, imm8  insns have a special encoding)
    .next_char:
        movzx  eax, byte ptr [esi];     // load the current character
        mov    edx, eax;

        // check if the character is alphabetic or not
        or    al, 0x20;        
        sub   al, 'a';
        cmp   al, 'z'-'a';   // unsigned compare trick: 'a' <= al <= 'z'
        setna al;            // 0:non-alpha  1:alpha  (not above)
        shl   al, 5;         // 0:non-alpha  0x20:alpha
        xor   dl, al;        // conditionally toggle the ASCII case bit
        mov   [esi], dl;     // unconditionally store

        inc   esi;
        dec   ecx;           // for AMD CPUs, or older Intel, it would be better to compare esi against an end pointer, since cmp/jz can fuse but dec can't.  This saves an add ecx, esi outside the loop
        jz .next_char;
    .early_out:

For 64bit code, just use rsi instead of esi. Everything else is the same.

Apparently MSVC inline asm doesn't allow .label local-symbol names. I changed them for the first version (with conditional branch), but not this one.

Using movzx eax, byte [esi] is better than mov al, [esi], avoiding a loop-carried false dependency on AMD, and Intel Haswell and later, and Silvermont-family. movzx isn't quite as cheap as a load on older AMD. (It is on Intel, and AMD Ryzen at least, one uop that only uses a load port, not an ALU port). Why doesn't GCC use partial registers?

Operating on al after that is still ok. There's no partial-register stall (or extra instructions to avoid it) because we aren't reading eax after setcc writes al. (There is no setcc r/m32, only r/m8, unfortunately).


I have to wonder what a professor would think if anyone handed in code like this for an assignment like that. :P I doubt even a smart compiler would use that setcc / shift trick unless you led the compiler towards it. (Maybe unsigned mask = (tmp>='a' && tmp<='z'); mask <<= 5; a[i] ^= mask; or something.) Compilers do know about the unsigned-compare trick, but gcc doesn't use it in some cases for non-compile-time-constant range checks, even when it can prove that the range is small enough.

Residual answered 11/3, 2016 at 9:48 Comment(10)
nice one :) but this solution also has the problem that the chars inbetween 'Z' and 'a' are regarded as valid characters ... oh hold, i was checking it with int, not unsigned int in C ... my fault. so yes, nice "hack"Vacate
i tried something similar in C, and got result -(200+x) for most, and 28 for ']' ... and didn't think about " > 26 " would still be TRUE for those -200 values in assembler (byte wrap around). too bad, the direction was good :)Vacate
@Tommylee2k: Yeah, it's tricky to grok. You see the sub reg, 'a', and then the cmp reg, 25, and think "cmp is also a subtract, why can't they be combined?" But the key is that the starting point matters for setting flags (carry and overflow). It's not just testing the sign bit of the result.Residual
yes, if you "drag" a range "to zero", all you need for a range check is to check for the the upper boundary... "a" <= x <= "z" is true, if (x-"a") is < 26 ... this can be especially helpful, if you have to combine several of these checks where you otherwise had to branch around (which can easily be messed up)Vacate
@Tommylee2k: I was pretty proud of myself for coming up with the "drag" a range to -128 idea, so I could use pcmpgtb to vectorize it. I didn't come up with unsigned compare on my own, but I did (re?)invent its use with pcmpgtb. Mostly it's a performance thing to reduce the amount of branches, esp. taken branches. Correctness is still non-trivial, since you have to make sure you don't have an off-by-one in the subtract (is the lower bound < or <=?) and stuff like that. Also, when writing asm, you're supposed to make it run fast, even if it takes longer to debug. Otherwise use C!Residual
:-D I'm REALLY curious, how the professor would react on an answer like "ok, due to the fact we're not time dependent, here's my C answer:" :-PVacate
@Tommylee2k: obviously you hand in the compiler output for your assignments. And then fail the exam if you didn't read and understand the compiler output. :P I forgot we were talking about a course, rather than why you'd want to ever use asm in the first place. (Although being able to read it to debug things and grok compiler output doesn't require knowing optimization tricks, being familiar with compiler tricks from small functions helps a lot in reading output for medium-size functions.)Residual
There is a BIG problem with this method. What if there is some already lowercase chars inside of the array? If you flip the bit, you will make them uppercase. And this is a BUG.Kreiner
Need to use OR instead of XOR in this instruction xor dl, 0x20; // toggle the ASCII case bit to make correct lowercase result.Kreiner
@ScienceDiscoverer: XOR is correct. Did you try it? xor dl, 0x20 will set the bit if it was unset (like or dl, 0x20 for upper to lower), and clear it if it was set (like and dl, ~0x20 for lower to upper). Remember that the point of this answer is to flip case, not force it to lower-case. That's why we use an xor separate from the or al, 0x20 that forces to lower case as part of detecting letters, otherwise we'd reuse that result. See also What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?Residual
H
2

For clarity's sake, I'll just use pure assembly and assume that...

  • char_array is a 32-bit pointer at [ebp+8].
  • array_size is a two's complement 32-bit number at [ebp+12].
  • For your platform (it is this way for most anyway), char's encoding is ASCII.

You should be able to deduce this yourself into inline assembly. Now, if you look at the table everyone is supposed to remember but barely anyone does, you'll notice some important details...

  • Uppercase letters A through Z map into codes 0x41 through 0x5A, respectively.
  • Lowercase letters a through z map into codes 0x61 through 0x7A, respectively.
  • Everything else is not a letter, and thus need no case conversion.
  • If you look at the binary representation of the upper and lowercase letter ranges, you'll notice that they are exactly the same, with the sole exception that uppercase letters have bit 6 cleared, and lowercase ones have it set.

As a result, the algorithm would be...

while array_size != 0
    byte = *char_array
    if byte >= 0x41 and byte <= 0x5A
        *char_array |= 0x20 // Turn it lowercase
    else if byte >= 0x61 and byte <= 0x7A
        *char_array &= 0xDF // Turn it uppercase
    array_size -= 1
    char_array += 1

Now, let's translate this into assembly...

mov eax, [ebp+8]      # char *eax = char_array
mov ecx, [ebp+12]     # int ecx = array_size

.loop:
    or ecx, ecx       # Compare ecx against itself
    jz .end_loop      # If ecx (array_size) is zero, we're done
    mov dl, [eax]     # Otherwise, store the byte at *eax (*char_array) into `char dl`
    cmp dl, 'A'       # Compare dl (*char_array) against 'A' (lower bound of uppercase letters)
    jb .continue      # If dl` (*char_array) is lesser than `A`, continue the loop
    cmp dl, 'Z'       # Compare dl (*char_array) against 'Z' (upper bound of uppercase letters)
    jbe .is_uppercase # If dl (*char_array) is lesser or equal to 'Z', then jump to .is_uppercase
    cmp dl, 'a'       # Compare dl (*char_array) against 'a' (lower bound of lowercase letters)
    jb .continue      # If dl (*char_array) is lesser than 'a', continue the loop
    cmp dl, 'z'       # Compare dl (*char_array) against 'z' (upper bound of lowercase letters)
    jbe .is_lowercase # If dl (*char_array) is lesser or equal to 'z', then jump to .is_lowercase
    jmp .continue     # All tests failed, so continue the loop

    .is_uppercase:
        or dl, 20h    # Set the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .is_lowercase:
        and dl, DFh   # Clear the 6th bit
        mov [eax], dl # Send the byte back to where it came from
        jmp .continue # Continue the loop

    .continue:
        inc eax       # Increment `eax` (`char_array`), much of like a pointer increment
        dec ecx       # Decrement `ecx` (`array_size`), so as to match the previous pointer increment
        jmp .loop     # Continue

.end_loop:

Once code reaches .end_loop, you're done.

I hope this has led a light on you!

Hydroponics answered 11/3, 2016 at 5:18 Comment(9)
Thank You! This step by step procedure is just what I needed! I'll take what I've done and upload the results shortly. If only I can, uh, upvote you.Tantara
@archon263: Don't worry if you can't upvote yet ;). If you want to inline that same code, just replace [ebp+8] with [char_array] and [ebp+12] with [array_size].Hydroponics
Thanks! I'm not that comfortable with inline yet, but I will see how to use it after editing my code.Tantara
I don't memorize the ascii table. I just use constants like 'a' or '0' in source code (including asm source code), and run the ascii(1) program to print the table in my terminal if needed. You can replace your magic hex constants with 'A', 'Z', 'a', 'z'. You can also simplify your comparisons: make a copy and or with 0x20, then you only need to check for between 'a' and 'z'. And you can use the unsigned-compare trick to do that range check: subtract 'a', cmp al, 26; ja .non_digit. In the al<'a' case, the sub wraps around, producing a large (unsigned) number.Residual
This would be easier to read if the explanation lines were comments, rather than alternating code and non-code. At least I think so, maybe for total beginners it's easier to read this? It's hard to visually find branch targets the way one normally can, by looking for the rare not-indented lines. Almost all of your explanations would fit easily on a comment line.Residual
@PeterCordes: Thanks for your suggestions. I implemented the second one (I'm used to write numerical constants in assembly for everything), and was aware of the first one, but preferred not to optimize the stuff too much, for the OP's understanding's sake.Hydroponics
Now that it's readable (+1 for that): jmp .continue on the line right before .continue: is a no-op and should be removed. Also, you could reverse .is_lowercase and .is_uppercase, so the last of the four conditional branches can be jnbe .continue, otherwise fall through into .is_lowercase. Also test ecx,ecx is always better than or ecx,ecx, because it can macro-fuse with a jcc.Residual
@PeterCordes: Ha, didn't noticed that redundant jmp. Anyway, thanks for your suggestions, applied them. Please let me know if you have any others! Not sure if taking macro-op fusion into account is just too much of an optimization, though..Hydroponics
@KemyLand: or same,same is not a useful idiom. There's no reason for beginners to even consider it. test same,same instead of cmp reg,0 is also an optimization, so for maximum clarity you should write that. However, it would be better to put the loop condition at the end of the loop, like normal for asm (branch on the flags set by dec ecx). (See my answer or the OP's answer).Residual
V
2

in ASCII 'a'-'z' and 'A'-'Z' are equivalent except one bit, 0x20

your friend here is XOR.

if you have a char ( either 'A'-'Z' or 'a'-'z'), XORing it with 0x20 will toggle the case;

before XORing, doing a range check makes sense. (to see if the value is really a letter)
You can simplify this range check by ORing the value to check with 0xef, which will make 'a' to 'A' and 'z' to 'Z', and then do the range check only once
(if you only compare to <'a' and >'Z' you will miss the characters inbetween ('[', ']', etc...)

Vacate answered 11/3, 2016 at 7:13 Comment(1)
Nice, I also thought using or to simplify the range check. I wasn't sure how obvious or easy to understand it was, so I spent a lot longer explaining it, since I was worried that people would wonder why it's safe to do the tolower when you don't know it's an alphabetic character yet. I'm glad other people thought of it, too. I thought it would be fun to write an optimized implementation, see my answer. I used a further trick that you didn't mention (the unsigned-compare trick).Residual
I
1

In an ascii table all letters are continuous:

A=0x41=01000001
a=0x61=01100001

Z=0x5A=01011010
z=0x7A=01111010

So you can see that by toggling the 6th bit you transform form upper to lower case.

Intercom answered 11/3, 2016 at 5:3 Comment(0)
T
1

Courtesy of @KemyLand for the helpful breakdown of assembly code, I have figured out how to convert Uppercase to Lowercase and vice-versa.

void changeCase (char char_array[], int array_size ) {
     //this function is designed to change lowercase letters to uppercase, and vice-versa, from a char-array given the array and its size.
__asm{
        // BEGIN YOUR CODE HERE

    mov eax, [ebp + 8];     //move to register value parameter 1 (the array)
    mov ecx, [ebp + 12];    //likewise parameter 2 (the array size)

    START:

        or ecx, ecx;    //check if pointer is 0
        cmp ecx, 0;
        je endloop;   //go to end loop

        mov dl,byte ptr [eax];  //not sure if needed, but reassurance
        cmp dl, 0x41;   // is char an A?
        jl cont;

        cmp dl, 0x5A;   // is char a Z?
        jle convertUP;

        cmp dl, 0x61;   // is char an a?
        jl cont;

        cmp dl, 0x7A;   // is char a z?
        jle convertDOWN;

        jmp cont;


    convertUP:
        or dl, 0x20;        //Yes! Finally got it working!
        mov byte ptr [eax], dl;

        jmp cont;

    convertDOWN:
        and dl, 0xdf;       //this will work for sure.
        mov[eax], dl;

        jmp cont


    cont:
        inc eax;
        dec ecx;

        jmp START;

    endloop:
}

}

Feel free to help explain what I might have missed! Thank you all for helping me understand the x86 assembly processor better.

Tantara answered 11/3, 2016 at 6:34 Comment(5)
You can write your constants like 'a', instead of in hex. Then you don't need a comment to explain the constants. Also, is char a z? doesn't correctly describe a cmp / jle. "is a" sounds more like cmp / je. The code is right, the comment is wrong. There's a saying that "asm code has only two kinds of bugs: 1. the code doesn't match the comments. 2. the comments don't describe a correct algorithm"Residual
Use test ecx,ecx, not or ecx,ecx, because it's faster. Put the conditional branch at the bottom of the loop, like a do{}while() loop. Structure your branches to minimize jumps. e.g. you should be able to arrange things so the last branch before convertUP either falls through into convertUP or jumps to cont. You even have a jmp cont right before cont:, which ... jumps over the white-space in the source code?? :P.Residual
Stuff like mov eax, [ebp + 8]; is a major no-no in inline asm. Your function could easily be inlined into another function, or compiled without frame pointers. Fortunately, you don't have to assume anything about where on the stack your args are, you can just tell MSVC to give them to you by writing mov eax, char_array. This might turn into a redundant mov eax, esi or something; IDK, I haven't looked at MSVC output. AFAIK there's no way to just ask MSVC to put variables in registers for you, and tell it which regs your results are in (to avoid storing and them compiler reload).Residual
You can save one byte of code-size in several instructions by using al to hold your source byte: There's a special encoding for cmp al, imm8, or al, imm8 etc. Don't worry about this, though. Small code size is nice, but there are more important things to think about while learning to write code that even works in the first place :PResidual
See my answer for more significant optimizations that are less obvious. My entire loop is 11 instructions (including the loop overhead), with one conditional branch other than the loop condition. Have fun understanding it :D (I do mean that literally; I think it is understandable and well commented.) Since this is for an assignment, I think you're good to hand in what you posted in this answer, though. Remove the totally unneeded jmp, and the or ecx,ecx that does nothing because you follow it with cmp ecx,0. (test ecx,ecx instead of cmp with 0 is mostly just a code-size win).Residual

© 2022 - 2024 — McMap. All rights reserved.