Applying ACL silently failing (sometimes)
Asked Answered
S

3

8

I have an application running in multiple servers applying some ACL's.

Problem is when more than one server is applying on the same folder structure (i.e. three levels), usually only levels one and three have the ACL's applied, but there's no exception.

I've created a test with parallel tasks (to simulate the different servers):

[TestMethod]
public void ApplyACL()
{
    var baseDir = Path.Combine(Path.GetTempPath(), "ACL-PROBLEM");

    if (Directory.Exists(baseDir))
    {
        Directory.Delete(baseDir, true);
    }

    var paths = new[]
    {
        Path.Combine(baseDir, "LEVEL-1"),
        Path.Combine(baseDir, "LEVEL-1", "LEVEL-2"),
        Path.Combine(baseDir, "LEVEL-1", "LEVEL-2", "LEVEL-3")
    };

    //create folders and files, so the ACL takes some time to apply
    foreach (var dir in paths)
    {
        Directory.CreateDirectory(dir);

        for (int i = 0; i < 1000; i++)
        {
            var id = string.Format("{0:000}", i);
            File.WriteAllText(Path.Combine(dir, id + ".txt"), id);
        }
    }

    var sids = new[]
    {
        "S-1-5-21-448539723-725345543-1417001333-1111111",
        "S-1-5-21-448539723-725345543-1417001333-2222222",
        "S-1-5-21-448539723-725345543-1417001333-3333333"
    };

    var taskList = new List<Task>();
    for (int i = 0; i < paths.Length; i++)
    {
        taskList.Add(CreateTask(i + 1, paths[i], sids[i]));        
    }

    Parallel.ForEach(taskList, t => t.Start());

    Task.WaitAll(taskList.ToArray());

    var output = new StringBuilder();
    var failed = false;
    for (int i = 0; i < paths.Length; i++)
    {
        var ok = Directory.GetAccessControl(paths[i])
                          .GetAccessRules(true, false, typeof(SecurityIdentifier))
                          .OfType<FileSystemAccessRule>()
                          .Any(f => f.IdentityReference.Value == sids[i]);

        if (!ok)
        {
            failed = true;
        }
        output.AppendLine(paths[i].Remove(0, baseDir.Length + 1) + " --> " + (ok ? "OK" : "ERROR"));
    }

    Debug.WriteLine(output);

    if (failed)
    {
        Assert.Fail();
    }
}

private static Task CreateTask(int i, string path, string sid)
{
    return new Task(() =>
    {
        var start = DateTime.Now;
        Debug.WriteLine("Task {0} start:  {1:HH:mm:ss.fffffff}", i, start);
        var fileSystemAccessRule = new FileSystemAccessRule(new SecurityIdentifier(sid), 
                                                            FileSystemRights.Modify | FileSystemRights.Synchronize,
                                                            InheritanceFlags.ContainerInherit | InheritanceFlags.ObjectInherit,
                                                            PropagationFlags.None,
                                                            AccessControlType.Allow);

        var directorySecurity = Directory.GetAccessControl(path);
        directorySecurity.ResetAccessRule(fileSystemAccessRule);
        Directory.SetAccessControl(path, directorySecurity);

        Debug.WriteLine("Task {0} finish: {1:HH:mm:ss.fffffff} ({2} ms)", i, DateTime.Now, (DateTime.Now - start).TotalMilliseconds);
    });
}

I'm getting the same problem: usually (but not always) only levels one and three have the ACL's applied.

Why is that and how can I fix this?

Sideman answered 21/11, 2017 at 12:38 Comment(4)
What happens when you use SetAccessRule instead of ResetAccessRule?Crosspiece
Same problem with SetAccessRule or ResetAccessRule.Sideman
I think better way to approach the problem would be that each service delegate the call to central service or use RabbitMQ and then let a worker do the actual job. See this thread #33985394.Crosspiece
Already thought about that and seems like the way to go: having a single ACL applier.Sideman
G
3

Directory.SetAccessControl internally calls the Win32 API function SetSecurityInfo: https://msdn.microsoft.com/en-us/library/windows/desktop/aa379588.aspx

The important part of the above documentation:

If you are setting the discretionary access control list (DACL) or any elements in the system access control list (SACL) of an object, the system automatically propagates any inheritable access control entries (ACEs) to existing child objects, according to the ACE inheritance rules.

The enumeration of child objects (CodeFuller already described this) is done in the low level function SetSecurityInfo itself. To be more detailed, this function calls into the system DLL NTMARTA.DLL, which does all the dirty work. The background of this is inheritance, which is a "pseudo inheritance", done for performance reasons. Every object contains not only the "own" ACEs, but also the inherited ACEs (those which are grayed out in Explorer). All this inheritance is done during the ACL setting, not during runtime ACL resolution / checking.

This former decision of Microsoft is also the trigger of the following problem (Windows admins should know this):

If you move a directory tree to another location in the file system where a different ACL is set, the ACLs of the objects of the moved try will not change. So to say, the inherited permissions are wrong, they do not match the parent’s ACL anymore. This inheritance is not defined by InheritanceFlags, but instead with SetAccessRuleProtection.

To add on CodeFuller’s answer:

>>After enumeration is completed, internal directory security record is assigned to directory.

This enumeration is not just a pure reading of the sub-objects, the ACL of every sub-object will be SET.

So the problem is inherent to the inner workings of Windows ACL handling: SetSecurityInfo checks the parent directory for all ACEs which should be inherited and then does a recursion and applies these inheritable ACEs to all subobjects.

I know about this because I have written a tool which sets the ACLs of complete file systems (with millions of files) which uses what we call a "managed folder". We can have very complex ALCs with automatic calculated list permissions. For the setting of the ACL to the files and folders I use SetKernelObjectSecurity. This API should not normally be used for file systems, since it does not handle that inheritance stuff. So you have to do this yourself. But, if you know what you do and you do it correctly, it is the only reliable way to set the ACL on a file tree in every situation. In fact, there can be situations (broken / invalid ACL entries in child objects) where SetSecurityInfo fails to set these objects correctly.

And now to the code from Anderson Pimentel:

It should be clear from the above that the parallel setting can only work if the inheritance is blocked at each directory level.
However, it does not work to just call

dirSecurity.SetAccessRuleProtection(true, true);

in the task, since this call may come to late.

I got the code working if the above statement is called before starting the task.

The bad news is that this call, done with C# also does a complete recursion.

So it seems that there is no real compelling solution in C#, beside using PInvoke calling the low-level security functions directly.

But that’s another story.

And to the initial problem where different servers are setting the ACL:

If we know about the intent behind and what you want the resulting ALC to be, we perhaps can find a way.

Let me know.

Granada answered 26/11, 2017 at 23:1 Comment(0)
C
2

It's a funny puzzle.

I've launched your test and the problem reproduces almost for each run. And ACL are often not applied for LEVEL-3 too.

However the problem does not reproduce if tasks run not in parallel. Also if directory does not contain those 1000 files, the problem reproduces much less often.

Such behavior is very similar to classic race condition.

I haven't found any explicit information on this topic but it seems like applying ACL on overlapping directory trees is not a thread-safe operation.

To confirm this we need to analyze implementation of SetAccessControl() (or rather underlying Windows API call). But let's try to imagine what it might be.

  1. SetAccessControl() is called for given directory and DirectorySecurity record.
  2. It creates some internal structure (filesystem object) and fills it with provided data.
  3. Then it starts enumeration of child objects (directories and files). Such enumeration is partly confirmed by tasks execution time. It's about 500 ms for task3, 1000 ms for task2 and 1500 ms for task1.
  4. After enumeration is completed, internal directory security record is assigned to directory.
  5. But in parallel, the same is done for SetAccessControl() called on parent directory. Finally it will overwrite the record created on step 4.

Of course, described flow is just an assumption. We need NTFS or Windows internals experts to confirm this.

But observed behavior almost certainly indicates race condition. Just avoid such parallel applying of ACL on overlapping directory trees and sleep well.

Convey answered 23/11, 2017 at 16:52 Comment(3)
That's what I think, too, but I'm afraid I cannot avoid this race condition, since I have multiple servers that can be working on overlapping directory trees with a huge number of files/directories (although I only need to apply the ACL's to the third level).Sideman
I simplified the code, getting rid of DirectoryInfo and going straight with the Directory class (to avoid any internal cache objects it may be using) and it really seems to override a child directory's ACL with the ones from its parent.Sideman
Bounty awarded because you pointed me to the right direction: I thought race condition were causing ACLs not to be applied, but instead they were indeed being applied, but overwritten.Sideman
C
1

Introduce a lock. You have the shared file system available, so use .NET to lock when a process makes changes to a folder:

using (new FileStream(lockFile, FileMode.Open, FileAccess.Read, FileShare.None))
{
    // file locked
}

In your code add on initialization:

var lockFile = Path.Combine(baseDir, ".lock"); // just create a file
File.WriteAllText(lockFile, "lock file");

and pass the well-known lock file to your tasks. Then wait for the file to get unlocked in each of your processes:

private static Task CreateTask(int i, string path, string sid, string lockFile)
{
    return new Task(() =>
    {
        var start = DateTime.Now;
        Debug.WriteLine("Task {0} start:  {1:HH:mm:ss.fffffff}", i, start);

        Task.WaitAll(WaitForFileToUnlock(lockFile, () =>
        {
            var fileSystemAccessRule = new FileSystemAccessRule(new SecurityIdentifier(sid),
                FileSystemRights.Modify | FileSystemRights.Synchronize,
                InheritanceFlags.ContainerInherit | InheritanceFlags.ObjectInherit,
                PropagationFlags.None,
                AccessControlType.Allow);

            var directorySecurity = Directory.GetAccessControl(path);
            directorySecurity.ResetAccessRule(fileSystemAccessRule);
            Directory.SetAccessControl(path, directorySecurity);
        }));

        Debug.WriteLine("Task {0} finish: {1:HH:mm:ss.fffffff} ({2} ms)", i, DateTime.Now, (DateTime.Now - start).TotalMilliseconds);
    });
}

private static async Task WaitForFileToUnlock(string lockFile, Action runWhenUnlocked)
{
    while (true)
    {
        try
        {
            using (new FileStream(lockFile, FileMode.Open, FileAccess.Read, FileShare.None))
            {
                runWhenUnlocked();
            }

            return;
        }
        catch (IOException exception)
        {
            await Task.Delay(100);
        }
    }
}

With those changes the unit test passes.

You can add further locks on the various levels to make the process most efficient - something like an hierarchy lock logic.

Campus answered 29/11, 2017 at 11:6 Comment(2)
Very good suggestion, though I'm afraid I could not implement it, since my application does not have write permission on the file system. All I can do is apply ACLs. But that definitely gave me some ideas.Sideman
You don't have to use a file lock in the same filesystem - you can create your custom share between your servers. Unless of course you don't have any access on the execution server either. You could also use something like Azure Storage for file locking and more advanced concurrency scenario: azure.microsoft.com/en-us/blog/…Campus

© 2022 - 2024 — McMap. All rights reserved.