How do I normalize a string?
Asked Answered
C

2

10

In .NET you can normalize (NFC, NFD, NFKC, NFKD) strings with String.Normalize() and there is a Text.NormalizationForm enum.

In .NET for Windows Store Apps, both are not available. I have looked in the String class and in the System.Text and System.Globalization namespaces, but found nothing.

Have I missed something? How do I normalize strings in Windows Store Apps?

Does anyone have an idea why the Normalize method was not made available for Store Apps?

Campman answered 8/2, 2013 at 12:59 Comment(3)
String.Normalize relies on native functions built in normaliz.dll. I dug a bit and found that it uses NormalizeString function. Since it is specific to Windows it is not available for Windows Store Apps. Unfortunatelly, I have no knowledge about alternatives.Geodetic
@AlexanderManekovskiy You are wrong, NormalizeString is on the approved list of Win32 and COM API functions usable in Windows Store apps.Oatcake
@Oatcake Wow, where was my eyes?! Thank you for pointing on this list.Geodetic
O
7

As you've pointed out, the Normalize method is not available on the String class on Windows store apps.

However, this just calls the NormalizeString function in the Windows API.

Even better, this function is in the approved list of Win32 and COM API functions usable in Windows Store apps.

That said, you'd make the following declarations:

public enum NORM_FORM 
{ 
  NormalizationOther  = 0,
  NormalizationC      = 0x1,
  NormalizationD      = 0x2,
  NormalizationKC     = 0x5,
  NormalizationKD     = 0x6
};

[DllImport("Normaliz.dll", CharSet = CharSet.Unicode, ExactSpelling = true,
    SetLastError = true)
public static extern int NormalizeString(NORM_FORM NormForm,
    string lpSrcString,
    int cwSrcLength,
    StringBuilder lpDstString,
    int cwDstLength);

You'd then call it like so:

// The form.
NORM_FORM form = ...;

// String to normalize.
string unnormalized = "...";

// Get the buffer required.
int bufferSize = 
    NormalizeString(form, unnormalized, unnormalized.Length, null, 0);

// Allocate the buffer.
var buffer = new StringBuilder(bufferSize);

// Normalize.
NormalizeString(form, unnormalized, unnormalized.Length, buffer, buffer.Length);

// Check for and act on errors if you want.
int error = Marshal.GetLastWin32Error();
Oatcake answered 6/3, 2013 at 20:24 Comment(4)
Is StringBuffer lpDstString correct, or did you mean StringBuilder?Campman
I am trying to get it work (using StringBuilder instead of StringBuffer, which AFAIK does not exist), but it does not work. bufferis always "empty" (contains nothing but a bunch of \0). Any idea what's causing this?Campman
@SebastianNegraszus StringBuffer was an error, it's supposed to be StringBuilder. Are you checking the return value from NormalizeString and the Marhsal.GetLastWin32Error values? They will give you more insight if something is going wrong.Oatcake
Hello for me buffer.Length is zero as it has not been filled yet which is why it results in an empty buffer!!!! Also, when I run NormalizeString with null, 0 for the final parameters and get the expected size as per the documentation, it is returning 64 for a 6 character English string!!!!!!!! This is not right!!!!!Marshmallow
M
0

Hello this is my working code, I don't need to trim the string termination characters as there is none present but I am doing so just in case.

By using -1 instead of a specified length I am letting it auto find the string terminator and this is the only way I could get it to work properly in a WinRT/WinPhoneRT platform target

       int bufferSize = NormalizeString(Globals.NORM_FORM.NormalizationKD, toNormalise, -1, null, 0);

        StringBuilder buffer = new StringBuilder(bufferSize);

        // Normalize.
        NormalizeString(Globals.NORM_FORM.NormalizationKD, toNormalise, -1, buffer, buffer.Capacity);

        // Check for and act on errors if you want.
        int error = Marshal.GetLastWin32Error();

        if(error !=0)
        {
            throw (new Exception("A Win32 error with code " + error + " has occured in unmanaged NormalizeString"));
        }
        char[] trim = {'\0'};

        return buffer.ToString().TrimEnd(trim);
Marshmallow answered 7/1, 2015 at 8:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.