Substring in PowerShell to truncate string length
Asked Answered
B

9

19

Is it possible in PowerShell, to truncate a string, (using SubString()?), to a given maximum number of characters, even if the original string is already shorter?

For example:

foreach ($str in "hello", "good morning", "hi") { $str.subString(0, 4) }

The truncation is working for hello and good morning, but I get an error for hi.

I would like the following result:

hell
good
hi
Balsam answered 14/1, 2015 at 13:38 Comment(2)
As in you want to add characters if its shorter and remove them if it's longer?Doy
if it's shorter, keep the string as it is, no need to add characters.Pyrene
O
32

You need to evaluate the current item and get the length of it. If the length is less than 4 then use that in the substring function.

foreach ($str in "hello", "good morning", "hi") {
    $str.subString(0, [System.Math]::Min(4, $str.Length)) 
}
Olla answered 14/1, 2015 at 13:50 Comment(0)
C
17

Or you could just keep it simple, using PowerShell's alternative to a ternary operator:

foreach ($str in "hello", "good morning", "hi") {
  $(if ($str.length -gt 4) { $str.substring(0, 4) } else { $str })
}

While all the other answers are "correct", their efficiencies go from sub-optimal to potentially horrendous. The following is not a critique of the other answers, but it is intended as an instructive comparison of their underlying operation. After all, scripting is more about getting it running soon than getting it running fast.

In order:

  1.  

    foreach ($str in "hello", "good morning", "hi") {
        $str.subString(0, [System.Math]::Min(4, $str.Length))
    }
    

    This is basically the same as my offering except that instead of just returning $str when it is too short, we call substring and tell it to return the whole string. Hence, sub-optimal. It is still doing the if..then..else but just inside Min, vis.

    if (4 -lt $str.length) {4} else {$str.length}
    
  2.  

    foreach ($str in "hello", "good morning", "hi") { $str -replace '(.{4}).+','$1' }
    

    Using regular expression matching to grab the first four characters and then replace the whole string with them means that the entire (possibly very long) string must be scanned by the matching engine of unknown complexity/efficiency.

    While a person can see that the '.+' is simply to match the entire remainder of the string, the matching engine could be building up a large list of backtracking alternatives since the pattern is not anchored (no ^ at the begining). The (not described) clever bit here is that if the string is less than five characters (four times . followed by one or more .) then the whole match fails and replace returns $str unaltered.

  3.  

    foreach ($str in "hello", "good morning", "hi") {
      try {
        $str.subString(0, 4)
      }
      catch [ArgumentOutOfRangeException] {
        $str
      }
    }
    

    Deliberately throwing exceptions instead of programmatic boundary checking is an interesting solution, but who knows what is going on as the exception bubbles up from the try block to the catch. Probably not much in this simple case, but it would not be a recommended general practice except in situations where there are many possible sources of errors (making it cumbersome to check for all of them), but only a few responses.

Interestingly, an answer to a similar question elsewhere using -join and array slices (which don't cause errors on index out of range, just ignore the missing elements):

$str[0..3] -join ""   # Infix

(or more simply)

-join $str[0..3]      # Prefix

could be the most efficient (with appropriate optimisation) given the strong similarity between the storage of string and char[]. Optimisation would be required since, by default, $str[0..3] is an object[], each element being a single char, and so bears little resemblance to a string (in memory). Giving PowerShell a little hint could be useful,

-join [char[]]$str[0..3]

However, maybe just telling it what you actually want,

new-object string (,$str[0..3]) # Need $str[0..3] to be a member of an array of constructor arguments

thereby directly invoking

new String(char[])

is best.

Cavin answered 11/10, 2016 at 10:46 Comment(2)
uberkluger, the way you vomit over other contributions is quite disgusting. And basing your judgement solely on performance is just silly. Still, you did not back your claims with numbers, so I ran (Measure-Command{1..100000 | % {<snippet>}|Out-Null}).TotalSeconds for each answer and found that: 1) yours is actually slower than @Eduard's that you dubbed as "sub-optimal" 2) your last suggestion is the slowest of all, by large 3) mine is the fastest, because using an operator on the array is faster than iterating over its elements, and would be faster yet using @mjolinor's regex.Persevere
$(if ($str.length -gt 4) { $str.substring(0, 4) } else { $str }) is far from simple and involves unnecessary use of $(...); without the latter, it is the fastest solution, however. As @NicolasMelay points out, several of the performance claims are incorrect. This answer provides benchmarks that juxtaposes the solutions here,Muscat
P
2

More regex love, using lookbehind:

PS > 'hello','good morning','hi' -replace '(?<=(.{4})).+'
hell
good
hi
Persevere answered 25/5, 2020 at 18:46 Comment(1)
Nice, though I suggest the following form: 'hello','good morning','hi' -replace '(?<=^.{4}).+', for conceptual clarity (anchoring at the start), and to avoid the unnecessary capture group. The anchoring presumably also helps avoid backtracking.Muscat
D
1

You could trap the exception:

foreach ($str in "hello", "good morning", "hi") { 
  try { 
    $str.subString(0, 4) 
  }
  catch [ArgumentOutOfRangeException] {
    $str
  }
}
Doy answered 14/1, 2015 at 13:56 Comment(0)
B
1

I'm late to the party as always! I have used the PadRight string function to address such an issue. I cannot comment on its relative efficiency compared to the other suggestions:

foreach ($str in "hello", "good morning", "hi") { $str.PadRight(4, " ").SubString(0, 4) }
Byrne answered 21/4, 2021 at 23:57 Comment(0)
P
0

You can also use -replace

foreach ($str in "hello", "good morning", "hi") { $str -replace '(.{4}).+','$1' }

hell
good
hi
Polymerous answered 14/1, 2015 at 14:3 Comment(0)
S
0

Old thread, but I came across the same problem and ended up with the below:-

$str.padright(4,"✓").substring(0,4).replace("✓","")

Replace the ✓ character with whatever rogue character you want. I used the character obtained from pressing the ALT GR and backtick key on the keyboard.

Sorbitol answered 6/5, 2021 at 7:5 Comment(0)
W
0

UGH, I feel so dirty, but here it is:

-join ("123123123".ToCharArray() | select -first 42) outputs full string: 123123123

-join ("123123123".ToCharArray() | select -first 3) outputs first 3 characters: 123

Even simpler, still dirty!

-join "123123123"[0..3] (Remember, it's zero based, so that's getting 4 characters, adjust accordingly)

Wolfy answered 27/9, 2021 at 17:2 Comment(0)
J
0

Combine Padright with Substring:

e.g. for width 20 "3667edacb".PadRight(20).Substring(0,20)

"'$("3667edacb".PadRight(20).Substring(0,20))'" gives: '3667edacb '

"'$("3667edac-6e3c-471d-b7e3-f9d16c8c6fab".PadRight(20).Substring(0,20))'" gives: '3667edac-6e3c-471d-b'

Jocularity answered 12/7 at 5:52 Comment(2)
What does this answer provide that existing answers don't? There are 2 other answers combining PadRight and Substring.Frangible
Yeah, not much. I somehow didn't see the others...Jocularity

© 2022 - 2024 — McMap. All rights reserved.