How remove accents in PowerShell?
Asked Answered
T

7

9

I have a script which creates users in Microsoft Exchange Server and Active Directory. So, though it's commmon that user's names have accents or ñ in Spain, I want to avoid them for the username to not to cause any incompatibilities in old systems.

So, how could I clean a string like this?

$name = "Ramón"

To be like that? :

$name = "Ramon"
Treadwell answered 20/10, 2011 at 13:30 Comment(0)
M
8

Well I can help you with some of the code.....

I used this recently in a c# project to strip from email addresses:

    static string RemoveDiacritics(string input)
    {
        string inputFormD = (input ?? string.Empty).Normalize(NormalizationForm.FormD);
        StringBuilder sb = new StringBuilder();

        for (var i = 0; i < inputFormD.Length; i++)
        {
            UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputFormD[i]);
            if (uc != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(inputFormD[i]);
            }
        }

        return (sb.ToString().Normalize(NormalizationForm.FormC));
    }

I guess I can now say 'extending into a PowerShell script/form is left to the reader'.... hope it helps....

Maitland answered 20/10, 2011 at 13:40 Comment(2)
+1 Smart snippet, I converted it to PowerShell, it works as expected thanks.Pithy
It works pretty fine in PowerShell. Really thanks for sharing :DTreadwell
G
23

As per ip.'s answer, here is the Powershell version.

function Remove-Diacritics {
param ([String]$src = [String]::Empty)
  $normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
  $sb = new-object Text.StringBuilder
  $normalized.ToCharArray() | % { 
    if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
      [void]$sb.Append($_)
    }
  }
  $sb.ToString()
}

# Test data
@("Rhône", "Basíl", "Åbo", "", "Gräsäntörmä") | % { Remove-Diacritics $_ }

Output:

Rhone
Basil
Abo

Grasantorma
Girth answered 20/10, 2011 at 19:8 Comment(0)
M
8

Well I can help you with some of the code.....

I used this recently in a c# project to strip from email addresses:

    static string RemoveDiacritics(string input)
    {
        string inputFormD = (input ?? string.Empty).Normalize(NormalizationForm.FormD);
        StringBuilder sb = new StringBuilder();

        for (var i = 0; i < inputFormD.Length; i++)
        {
            UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(inputFormD[i]);
            if (uc != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(inputFormD[i]);
            }
        }

        return (sb.ToString().Normalize(NormalizationForm.FormC));
    }

I guess I can now say 'extending into a PowerShell script/form is left to the reader'.... hope it helps....

Maitland answered 20/10, 2011 at 13:40 Comment(2)
+1 Smart snippet, I converted it to PowerShell, it works as expected thanks.Pithy
It works pretty fine in PowerShell. Really thanks for sharing :DTreadwell
P
7

Another PowerShell translation of @ip for non C# coders ;o)

function Remove-Diacritics 
{
  param ([String]$sToModify = [String]::Empty)

  foreach ($s in $sToModify) # Param may be a string or a list of strings
  {
    if ($sToModify -eq $null) {return [string]::Empty}

    $sNormalized = $sToModify.Normalize("FormD")

    foreach ($c in [Char[]]$sNormalized)
    {
      $uCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($c)
      if ($uCategory -ne "NonSpacingMark") {$res += $c}
    }

    return $res
  }
}

Clear-Host
$name = "Un été de Raphaël"
Write-Host (Remove-Diacritics $name )
$test = ("äâûê", "éèà", "ùçä")
$test | % {Remove-Diacritics $_}
Remove-Diacritics $test
Pithy answered 21/10, 2011 at 7:12 Comment(0)
M
7

With the help of the above examples I use this "one-liner:" in pipe (tested only in Win10):

"öüóőúéáűí".Normalize("FormD") -replace '\p{M}', ''

Result:

ouooueeui
Michalmichalak answered 20/4, 2018 at 12:20 Comment(0)
C
4
PS> [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding(1251).GetBytes("Ramón"))
Ramon
PS>
Chicky answered 13/8, 2015 at 15:26 Comment(1)
Fails for some characters, e.g. Æ×Þ°±ß…. A real Old English example: returns Fore ??re m?r?e? if applied to Fore ðære mærðe…Glede
S
3

Instead of creating a stringbuilder and looping over characters, you can just use -replace on the NFD string to remove combining marks:

function Remove-Diacritics {
param ([String]$src = [String]::Empty)
  $normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
  ($normalized -replace '\p{M}', '')
}
Southeaster answered 23/3, 2018 at 19:55 Comment(0)
X
2

Another solution... quickly "reuse" your C# in PowerShell (C# code credits lost somewhere on the net).

Add-Type -TypeDefinition @"
    using System.Text;
    using System.Globalization;

    public class Utils
    {
        public static string RemoveDiacritics(string stIn)
        {
            string stFormD = stIn.Normalize(NormalizationForm.FormD);
            StringBuilder sb = new StringBuilder();

            for (int ich = 0; ich < stFormD.Length; ich++)
            {
                UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
                if (uc != UnicodeCategory.NonSpacingMark)
                {
                    sb.Append(stFormD[ich]);
                }
            }
            return (sb.ToString().Normalize(NormalizationForm.FormC));
        }
    }
"@ | Out-Null

[Utils]::RemoveDiacritics("ABC-abc-ČŠŽ-čšž")
Xylophone answered 21/6, 2013 at 13:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.