What is the shortest way in .NET to sort strings starting with 1, 10 and 2 and respect the number ordering?
Asked Answered
M

8

11

I need to sort file names as follows: 1.log, 2.log, 10.log

But when I use OrderBy(fn => fn) it will sort them as: 1.log, 10.log, 2.log

I obviously know that this could be done by writing another comparer, but is there a simpler way to change from lexicographical order to natural sort order?

Edit: the objective is to obtain the same ordering as when selecting "order by name" in Windows Explorer.

Monumental answered 26/8, 2011 at 12:32 Comment(0)
P
7

You can use the Win32 CompareStringEx function. On Windows 7 it supports the sorting you need. You will have use P/Invoke:

static readonly Int32 NORM_IGNORECASE = 0x00000001;
static readonly Int32 NORM_IGNORENONSPACE = 0x00000002;
static readonly Int32 NORM_IGNORESYMBOLS = 0x00000004;
static readonly Int32 LINGUISTIC_IGNORECASE = 0x00000010;
static readonly Int32 LINGUISTIC_IGNOREDIACRITIC = 0x00000020;
static readonly Int32 NORM_IGNOREKANATYPE = 0x00010000;
static readonly Int32 NORM_IGNOREWIDTH = 0x00020000;
static readonly Int32 NORM_LINGUISTIC_CASING = 0x08000000;
static readonly Int32 SORT_STRINGSORT = 0x00001000;
static readonly Int32 SORT_DIGITSASNUMBERS = 0x00000008; 

static readonly String LOCALE_NAME_USER_DEFAULT = null;
static readonly String LOCALE_NAME_INVARIANT = String.Empty;
static readonly String LOCALE_NAME_SYSTEM_DEFAULT = "!sys-default-locale";

[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
static extern Int32 CompareStringEx(
  String localeName,
  Int32 flags,
  String str1,
  Int32 count1,
  String str2,
  Int32 count2,
  IntPtr versionInformation,
  IntPtr reserved,
  Int32 param
);

You can then create an IComparer that uses the SORT_DIGITSASNUMBERS flag:

class LexicographicalComparer : IComparer<String> {

  readonly String locale;

  public LexicographicalComparer() : this(CultureInfo.CurrentCulture) { }

  public LexicographicalComparer(CultureInfo cultureInfo) {
    if (cultureInfo.IsNeutralCulture)
      this.locale = LOCALE_NAME_INVARIANT;
    else
      this.locale = cultureInfo.Name;
  }

  public Int32 Compare(String x, String y) {
    // CompareStringEx return 1, 2, or 3. Subtract 2 to get the return value.
    return CompareStringEx( 
      this.locale, 
      SORT_DIGITSASNUMBERS, // Add other flags if required.
      x, 
      x.Length, 
      y, 
      y.Length, 
      IntPtr.Zero, 
      IntPtr.Zero, 
      0) - 2; 
  }

}

You can then use the IComparer in various sorting API's:

var names = new [] { "2.log", "10.log", "1.log" };
var sortedNames = names.OrderBy(s => s, new LexicographicalComparer());

You can also use StrCmpLogicalW which is the function used by Windows Explorer. It has been available since Windows XP:

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
static extern Int32 StrCmpLogical(String x, String y);

class LexicographicalComparer : IComparer<String> {

  public Int32 Compare(String x, String y) {
    return StrCmpLogical(x, y);
  }

}

Simpler, but you have less control over the comparison.

Pulpit answered 26/8, 2011 at 13:59 Comment(0)
R
4

If your file names always only consist in digits, you can use Path.GetFileNameWithoutExtension() to discard the file extension and Convert.ToInt32() (or similar) to convert your file names to integers for comparison purposes:

var ordered = yourFileNames.OrderBy(
    fn => Convert.ToInt32(Path.GetFileNameWithoutExtension(fn)));

In the general case, or if you're looking for a more "standard" way to do this, you can p/invoke StrCmpLogicalW(), which Explorer uses to sort file names in its views. However, doing that will force you to implement an IComparer<string> if you want to use OrderBy().

Receptive answered 26/8, 2011 at 12:39 Comment(3)
Thanks, this is what I had initially written; however I am hoping for a more standard way to do this comparison, if any...Monumental
I was also thinking of this way. But if there is even a single file like config.txt in the log folder, this would crashVega
This will, however, blow up very nicely if there are trailing characters in the filename (e.g. 4service.log)Rockwell
V
2

You could just remove all the non digit characters, parse to int and then sort:

Regex r = new Regex(@"[^\d]");
OrderBy(fn => int.Parse(r.Replace(fn, "")));
Vega answered 26/8, 2011 at 12:35 Comment(13)
Probably not very fast, but the most succinct solution that doesn't require mutating the file names into other things.Bridie
This is weak. It puts 1.10 after 10.1Kordula
@Chris it sorts 100000 elements in 0,2 seconds. Hopefully he won't have that many logfiles anywayVega
@David Why would he have such a file? I would assume that this is just rolling filesVega
@David Heffernan sounds like the regex just needs to be something like \d*.\d* (my regex sucks)Bridie
@Chris nah, that wouldn't work. Probably something like [^\d\.].Vega
@David added support for thatVega
@Oskar Kjellin won't that break if there's text like 1.10.10?Bridie
@Chris True, removed it. I don't think that there is any need for that kind of sorting anyhow given how the question is about logfilesVega
That was why my suggestion was to just stop after the first decimal sequence.Bridie
@Chris Yeah, but your regex would match the digits, my regex matches everything but the digits and replaces them with empty strings. Not sure how to make it remove anything that isn't a matchVega
So something like ^[\d\.\d] or whatever the proper regex would be for thatBridie
@ChrisMarisic let us continue this discussion in chatVega
S
2

The simplest (not necessarily fastest/optimal) way would be IMHO to left-pad them all to some predefined maximum length with zeroes. I.e.

var data = new[] { "1.log", "10.log", "2.log" };
data.OrderBy(x => x.PadLeft(10, '0')).Dump();
Syndetic answered 26/8, 2011 at 12:37 Comment(0)
T
0

It would be easier if it would be a lexicographical order,.

String comparison is always letter by letter.

How you want to deal with that without looking at the whole number?

No, a separate comparer is the only solution.

Treblinka answered 26/8, 2011 at 12:34 Comment(0)
M
0

no I don't think so - I guess you have to write it yourself as long as your data is just a string. If you make your data into something like

struct LogDescription
{
   public int LogBase { get; set; }
   public override ToString()
   { return string.Format("{0}.log", LogBase); }
}

you can sort by using the LogBase-Field

Mouthy answered 26/8, 2011 at 12:35 Comment(0)
K
0

You can do something like this when you can assure the format of your names are NUMBER.VALUE:

var q = strings.Select(s => s.Split(new[] {'.'}, 2))
    .Select(s => new
                        {
                            Number = Convert.ToInt32(s[0]),
                            Name = s[1]
                        })
    .OrderBy(s => s.Number)
    .Select(s => string.Format("{0}.{1}", s.Number, s.Name));
Kaleighkalends answered 26/8, 2011 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.