OrderBy ignoring accented letters
Asked Answered
C

2

7

I want a method like OrderBy() that always orders ignoring accented letters and to look at them like non-accented. I already tried to override OrderBy() but seems I can't do that because that is a static method.

So now I want to create a custom lambda expression for OrderBy(), like this:

public static IOrderedEnumerable<TSource> ToOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    if(source == null)
        return null;

    var seenKeys = new HashSet<TKey>();

    var culture = new CultureInfo("pt-PT");
    return source.OrderBy(element => seenKeys.Add(keySelector(element)), 
                          StringComparer.Create(culture, false));
} 

However, I'm getting this error:

Error 2 The type arguments for method 'System.Linq.Enumerable.OrderBy<TSource,TKey>(System.Collections.Generic.IEnumerable<TSource>, System.Func<TSource,TKey>, System.Collections.Generic.IComparer<TKey>)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Seems it doesn't like StringComparer. How can I solve this?

Note:

I already tried to use RemoveDiacritics() from here but I don't know how to use that method in this case. So I tried to do something like this which seems nice too.

Chaetopod answered 28/1, 2016 at 12:28 Comment(2)
Are you using Linq2Sql or LinqObjects ?Klehm
What is the HashSet for?Burnside
C
2

Solved! I was getting that error because to use StringComparer the element to sort in OrderBy() expression that element needs to be a string.

So when I know that element is a string I cast to a string and I use the RemoveDiacritics() method to ignore the accented letters and to look at them like non-accented.

public static IOrderedEnumerable<TSource> ToOrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    if(!source.SafeAny())
        return null;

    return source.OrderBy(element => Utils.RemoveDiacritics(keySelector(element).ToString()));
}

To garantee the RemoveDiacritics() works fine I add a HtmlDecode() line.

public static string RemoveDiacritics(string text)
{
    if(text != null)
        text = WebUtility.HtmlDecode(text);

    string formD = text.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    foreach (char ch in formD)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(ch);
        }
    }

    return sb.ToString().Normalize(NormalizationForm.FormC);
}
Chaetopod answered 28/1, 2016 at 13:50 Comment(3)
There is absolutely no need to implement your ToOrderBy. If you have a string enumeration named mystrings you can simply call mystrings.OrderBy(RemoveDiacritics, StringComparer.Create(culture, false))Inn
@RenéVogt I know but I want to performance that because my project is huge and I don't want to do that modification in the all projectChaetopod
@RenéVogt to use ToOrderBy() in my solution I only need to do Ctrl + F and replace .OrderBy( with .ToOrderBy( and in the future if I need to change something in this logic I only need to change in one place and not in the entire project. Here's my performance. For the exception I have a SafeAny() method where I check if is not null and have any element.Chaetopod
H
2

OrderBy takes a keySelector as first argument. This keySelector should be a Func<string,T>. So you need a method that takes a string and returns a value by which your enumeration should be sorted.

Unfortunatly I'm not sure how to determine if a character is a "accented letter". The RemoveDiacritics doesn't work for my é.

So let's assume you have a method called IsAccentedLetter that determines if a character is an accented letter:

public bool IsAccentedLetter(char c)
{
    // I'm afraid this does NOT really do the job
    return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.NonSpacingMark;
}

So you can sort your list like that:

string[] myStrings = getStrings(); // whereever your strings come from
var ordered = myStrings.OrderBy(s => new string(s.Select(c => 
    IsAccentedLetter(c) ? ' ' : c).ToArray()), StringComparer.Create(culture, false));

The lambda expression takes a string and returns the same string, but replaced the accented letters with an empty space.
OrderBy now sorts your enumeration by these strings, and so "ignores" the accented letters.

UPDATE: If you have a working method RemoveDiacritics(string s) that returns the strings with the accented letters replaced as you want, you may simply call OrderBy like this:

string[] mystrings = getStrings();
var ordered = myStrings.OrderBy(RemoveDiacritics, StringComparer.Create(culture, false));
Hornbook answered 28/1, 2016 at 13:6 Comment(2)
that logic gives me the error 'TSource' does not contain a definition for 'Select' and no extension method 'Select' accepting a first argument of type 'TSource' could be found (are you missing a using directive or an assembly reference?). Besides you are replacing the accented letter with a white space and I want the accented letter to be trated like a non-accented letter.Chaetopod
@Chaetopod oh this was a misunderstanding: my source´ was meant to be your original list. I did not intend to implement my own ToOrderBy` extension at all, this is not necessary! In your solution, you don't need your ToOrderBy, you can just call mystrings.OrderBy(RemoveDiacritics, StringComparer,Create(culture, false)Inn
C
2

Solved! I was getting that error because to use StringComparer the element to sort in OrderBy() expression that element needs to be a string.

So when I know that element is a string I cast to a string and I use the RemoveDiacritics() method to ignore the accented letters and to look at them like non-accented.

public static IOrderedEnumerable<TSource> ToOrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    if(!source.SafeAny())
        return null;

    return source.OrderBy(element => Utils.RemoveDiacritics(keySelector(element).ToString()));
}

To garantee the RemoveDiacritics() works fine I add a HtmlDecode() line.

public static string RemoveDiacritics(string text)
{
    if(text != null)
        text = WebUtility.HtmlDecode(text);

    string formD = text.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    foreach (char ch in formD)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(ch);
        }
    }

    return sb.ToString().Normalize(NormalizationForm.FormC);
}
Chaetopod answered 28/1, 2016 at 13:50 Comment(3)
There is absolutely no need to implement your ToOrderBy. If you have a string enumeration named mystrings you can simply call mystrings.OrderBy(RemoveDiacritics, StringComparer.Create(culture, false))Inn
@RenéVogt I know but I want to performance that because my project is huge and I don't want to do that modification in the all projectChaetopod
@RenéVogt to use ToOrderBy() in my solution I only need to do Ctrl + F and replace .OrderBy( with .ToOrderBy( and in the future if I need to change something in this logic I only need to change in one place and not in the entire project. Here's my performance. For the exception I have a SafeAny() method where I check if is not null and have any element.Chaetopod

© 2022 - 2024 — McMap. All rights reserved.