'4' and '4' clash in primary key but not in filesystem
Asked Answered
O

1

16

There is DataTable with primary key to store information about files. There happen to be 2 files which differ in names with symbols '4' and '4' (0xff14, a "Fullwidth Digit Four" symbol). The DataTable fails to include them both because of failed uniqueness. However, in Windows filesystem they seem to be able to coexist without any issues.

The behavior does not seem to depend on locale settings, I changed "Region&Language->Formats->Format" from English to japanese, also "language for non-unicode programs" changes. Locale was printed as "jp-JP", "en-GB". Always same result.

Questions:

  1. what would be less intrusive way to fix it? I could switch to using containers instead of System.Data.* but I'd like to avoid it. Is it possible to define custom comparer for the column or otherwise better check the uniqueness? Enabling case sensitivity (which would fix this one) would cause other issues.
  2. is there any chance that some global settings would fix it without rebuilding the software?

The demo program with failure:

using System;
using System.Data;

namespace DataTableUniqueness
{
    class Program
    {
        static void Main(string[] args)
        {
            var changes = new DataTable("Rows");

            var column = new DataColumn { DataType = Type.GetType("System.String"), ColumnName = "File" };
            changes.Columns.Add(column);
            var primKey = new DataColumn[1];
            primKey[0] = column;
            changes.PrimaryKey = primKey;

            changes.Rows.Add("4.txt");
            try
            {
                changes.Rows.Add("4.txt"); // throws the exception
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception: {0}", e);
            }
        }
    }
}

The exception

Exception: System.Data.ConstraintException: Column 'File' is constrained to be unique.  Value '4.txt' is already present.
   at System.Data.UniqueConstraint.CheckConstraint(DataRow row, DataRowAction action)
   at System.Data.DataTable.RaiseRowChanging(DataRowChangeEventArgs args, DataRow eRow, DataRowAction eAction, Boolean fireEvent)
   at System.Data.DataTable.SetNewRecordWorker(DataRow row, Int32 proposedRecord, DataRowAction action, Boolean isInMerge, Boolean suppressEnsurePropertyChanged, Int32 position, Boolean fireEvent, Exception& deferredException)
   at System.Data.DataTable.InsertRow(DataRow row, Int64 proposedID, Int32 pos, Boolean fireEvent)
   at System.Data.DataRowCollection.Add(Object[] values)

PS: The locale is seen as: enter image description here

Owe answered 16/5, 2018 at 12:52 Comment(5)
thats an interesting behaviour, maybe change.Rows.Add("someFile") parses unicode characters into something else?Salliesallow
@mjwills, no underlying database, just in memory data, as in the exampleOwe
This doesn't answer your question at all, but as a work-around (if you don't find a solution) you may choose to reference your files by a unique ID instead using msdn.microsoft.com/en-us/library/aa364952(VS.85).aspx i.e. let them exist on disk with any kind of funky filename you or your users wish to choose, but add them to the datatable and later look them up by the ID windows already has for them.Aliaalias
Perhaps the ASCII representation for the value in the extra bytes is a non-printable char, thus the space. When compared in DataTable the value resolves to '4'?Lee
Apparently it is because "4".Normalize(NormalizationForm.FormKC) and "4".Normalize(NormalizationForm.FormKD) all equal "4". The question now is how to disable that...Dialecticism
A
5

By using DataType = typeof(object) you "disable" the string normalization. String equality is still used for comparison. I don't know if there are other side effects.

More complex solution: implement a "wrapper" for the string class:

public class MyString : IEquatable<MyString>, IComparable, IComparable<MyString>
{
    public static readonly StringComparer Comparer = StringComparer.InvariantCultureIgnoreCase;
    public readonly string Value;

    public MyString(string value)
    {
        Value = value;
    }

    public static implicit operator MyString(string value)
    {
        return new MyString(value);
    }

    public static implicit operator string(MyString value)
    {
        return value != null ? value.Value : null;
    }

    public override int GetHashCode()
    {
        return Comparer.GetHashCode(Value);
    }

    public override bool Equals(object obj)
    {
        if (obj == null || !(obj is MyString))
        {
            return false;
        }

        return Comparer.Equals(Value, ((MyString)obj).Value);
    }

    public override string ToString()
    {
        return Value != null ? Value.ToString() : null;
    }

    public bool Equals(MyString other)
    {
        if (other == null)
        {
            return false;
        }

        return Comparer.Equals(Value, other.Value);
    }

    public int CompareTo(object obj)
    {
        if (obj == null)
        {
            return 1;
        }

        return CompareTo((MyString)obj);
    }

    public int CompareTo(MyString other)
    {
        if (other == null)
        {
            return 1;
        }

        return Comparer.Compare(Value, other.Value);
    }
}

And then:

var changes = new DataTable("Rows");

var column = new DataColumn { DataType = typeof(MyString), ColumnName = "File" };
changes.Columns.Add(column);
var primKey = new DataColumn[1];
primKey[0] = column;
changes.PrimaryKey = primKey;

changes.Rows.Add((MyString)"a");
changes.Rows.Add((MyString)"4.txt");
try
{
    changes.Rows.Add((MyString)"4.txt"); // throws the exception
}
catch (Exception e)
{
    Console.WriteLine("Exception: {0}", e);
}

var row = changes.Rows.Find((MyString)"A");
Allometry answered 16/5, 2018 at 13:17 Comment(5)
this seems to enable also case sensitivity. But it looks like a way to try more.Owe
@Owe There is a second possibility: implementing a "wrapper" around the string class (I've given an example). Note that it isn't very clear how the collation (the case insensitive name comparison) of NTFS works (I wasn't able to find the documentation). From what I remember, the NTFS file system saves the collation it uses in the file system (so depending on "where in the world" you formatted your hd, the sorting of the files, and which files are unique could be different). But I don't know how to retrieve it. This is especially important with the four turkish I (I ı İ i)Allometry
where in Turkish, the uppercase i is İ (with dot)Allometry
I am not 100% sure but there is special comparison for strings in .net which I hope decides equality identically with the one in filesystem. Thanks for the detailed example, I will try it.Owe
@Allometry My experience is that NTFS does no normalisation, which would fit with the behaviour described by the OP. You're right about collation and case folding though. It does indeed remember (forever) the system locale in effect when the volume is formatted.Addition

© 2022 - 2024 — McMap. All rights reserved.