This can be done with a single formula in Excel 2016 onwards.
While pnuts' solution lists explicit characters to strip, this solution lists explicitly valid characters.
Assume your dirty data is in column A.
Assume you want your clean data in column B.
Use the very last formula below, in cell B1. To enter the formula into cell B1, do the following:
- Click on cell B1
- Click into the formula bar
- Paste the formula
- Press CTRL+Shift+Enter <-- important step
Copy cell B1 and paste it down column B as far as you need.
First, here is a short example to explain what's going on:
=TEXTJOIN("",TRUE,
IFs(
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "t", "t",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "e", "e",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "s", "s",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "T", "T",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "E", "E",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "S", "S",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = "2", "2",
MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1) = " ", " ",
true, ""
)
)
In this case I have specified the following characters as valid: t, e, s, T, E, S, 2 and the space character.
Obviously you need to extend the formula out to the full alphabet, listing each upper and lowercase character separately. Likewise, extend it out to include numbers if you want to keep those. Note that numbers are coded as strings.
How this works:
In a nutshell we are splitting the source string into an array of separate characters, then for each character, checking whether it is in our set of valid characters and including it in the result if it is, otherwise replacing it with an empty string if it is not.
The 'IFS' function goes through its arguments a pair at a time. If the first argument evaluates as true, then the second argument is returned. If not, it proceeds with the next pair of arguments - this is why you see the letters listed twice on each row. The last pair of values in the IFS function is the set 'true' and the empty string. This is saying if we get to the end of the set of valid values (ie and haven't matched a valid value) then return the empty string.
More background on why this works:
This is a variation on a solution given at ExcelJet. In that solution the TEXTJOIN function is used (to concatenate the values of an array) with the INDIRECT function (which splits the string into an array) together with a mathematical operator (the plus symbol) to force the evaluation of a calculation between every character in the string with a numerical value. Numerical characters in the string will return numerical values while other characters will return an error. That solution uses the function ISERR to check for an error to decide whether or not to include a given character in the final output. A similar article exists there to work the other way around - to exclude the numbers and keep the letters.
The problem I wanted solved is for the coder to decide which values are valid and which are not. I went through trying to incorporate VLOOKUP and INDEX functions with the INDIRECT function but they will only work on the first character in the string. The trick is that not all functions will act on the output of INDIRECT in such a way as to evaluate every element in the array (ie. every character in the string). The secret was that ExcelJet used a mathematical operator. If you check Microsoft's full function reference, IFS is categorised as a "logic" function. I suspect logic functions can be used with INDIRECT in this way.
(Note: I also attempted to use AND, and OR in various combinations. However, the INDIRECT evaluates all characters in the string. So for example, using the CODE function to gain each character's ASCII value and asserting all characters must have values between 65 - 90 (uppercase) or 97 - 122 (lowercase letters) will work only if all characters in the string are uppercase, or all are lowercase, but not if there is a mixture.)
I don't know how the performance of this solution compares with the earlier suggestion using SUBSTITUTE. If you only want to strip out a few characters, I recommend the SUBSTITUTE solution. If you want to specify explicitly the valid characters to keep (which was the original question), use this one.
Finally, here is the exact answer you need, including the conversion to uppercase which you didn't note in the question, but displayed in your example. (For anyone else not wanting the uppercase conversion, remove the instances of 'UPPER' from this example, then add the alphabet again to the list, in lowercase, and be sure to leave the 'true'/empty string pair as the last entry in the list.)
=TEXTJOIN("",TRUE,
IFs(
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "A", "A",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "B", "B",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "C", "C",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "D", "D",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "E", "E",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "F", "F",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "G", "G",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "H", "H",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "I", "I",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "J", "J",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "K", "K",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "L", "L",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "M", "M",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "N", "N",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "O", "O",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "P", "P",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "Q", "Q",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "R", "R",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "S", "S",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "T", "T",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "U", "U",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "V", "V",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "W", "W",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "X", "X",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "Y", "Y",
upper(MID(a1,ROW(INDIRECT(CONCATENATE("1:",LEN(a1)))),1)) = "Z", "Z",
true, ""
)
)
With the original question being "eliminate all but alpha characters" - this answer does the trick in a formula without the need for VBA.