How to implement a Thread Safe HashTable (PhoneBook) Data Structure in Swift?
Asked Answered
E

3

12

I am trying to implement a Thread-Safe PhoneBook object. The phone book should be able to add a person, and look up a person based on their name and phoneNumber. From an implementation perspective this simply involves two hash tables, one associating name -> Person and another associating phone# -> Person.

The caveat is I want this object to be threadSafe. This means I would like to be able to support concurrent lookups in the PhoneBook while ensuring only one thread can add a Person to the PhoneBook at a time. This is the basic reader-writers problem, and I am trying to solve this using GrandCentralDispatch and dispatch barriers. I am struggling to solve this though as I am running into issues.. Below is my Swift playground code:

//: Playground - noun: a place where people can play

import UIKit
import PlaygroundSupport

PlaygroundPage.current.needsIndefiniteExecution = true

public class Person: CustomStringConvertible {
    public var description: String {
        get {
            return "Person: \(name), \(phoneNumber)"
        }
    }

    public var name: String
    public var phoneNumber: String
    private var readLock = ReaderWriterLock()

    public init(name: String, phoneNumber: String) {
        self.name = name
        self.phoneNumber = phoneNumber
    }


    public func uniquePerson() -> Person {
        let randomID = UUID().uuidString
        return Person(name: randomID, phoneNumber: randomID)
    }
}

public enum Qos {
    case threadSafe, none
}

public class PhoneBook {

    private var qualityOfService: Qos = .none
    public var nameToPersonMap = [String: Person]()
    public var phoneNumberToPersonMap = [String: Person]()
    private var readWriteLock = ReaderWriterLock()


    public init(_ qos: Qos) {
        self.qualityOfService = qos
    }

    public func personByName(_ name: String) -> Person? {
        var person: Person? = nil
        if qualityOfService == .threadSafe {
            readWriteLock.concurrentlyRead { [weak self] in
                guard let strongSelf = self else { return }
                person = strongSelf.nameToPersonMap[name]
            }
        } else {
            person = nameToPersonMap[name]
        }

        return person
    }

    public func personByPhoneNumber( _ phoneNumber: String) -> Person? {
        var person: Person? = nil
        if qualityOfService == .threadSafe {
            readWriteLock.concurrentlyRead { [weak self] in
                guard let strongSelf = self else { return }
                person = strongSelf.phoneNumberToPersonMap[phoneNumber]
            }
        } else {
            person = phoneNumberToPersonMap[phoneNumber]
        }

        return person
    }

    public func addPerson(_ person: Person) {
        if qualityOfService == .threadSafe {
            readWriteLock.exclusivelyWrite { [weak self] in
                guard let strongSelf = self else { return }
                strongSelf.nameToPersonMap[person.name] = person
                strongSelf.phoneNumberToPersonMap[person.phoneNumber] = person
            }
        } else {
            nameToPersonMap[person.name] = person
            phoneNumberToPersonMap[person.phoneNumber] = person
        }
    }

}


// A ReaderWriterLock implemented using GCD and OS Barriers.
public class ReaderWriterLock {

    private let concurrentQueue = DispatchQueue(label: "com.ReaderWriterLock.Queue", attributes: DispatchQueue.Attributes.concurrent)
    private var writeClosure: (() -> Void)!

    public func concurrentlyRead(_ readClosure: (() -> Void)) {
        concurrentQueue.sync {
            readClosure()
        }
    }

    public func exclusivelyWrite(_ writeClosure: @escaping (() -> Void)) {
        self.writeClosure = writeClosure
        concurrentQueue.async(flags: .barrier) { [weak self] in
            guard let strongSelf = self else { return }
            strongSelf.writeClosure()
        }
    }

}

// MARK: Testing the synchronization and thread-safety

for _ in 0..<5 {
    let iterations = 1000
    let phoneBook = PhoneBook(.none)

    let concurrentTestQueue = DispatchQueue(label: "com.PhoneBookTest.Queue", attributes: DispatchQueue.Attributes.concurrent)
    for _ in 0..<iterations {
        let person = Person(name: "", phoneNumber: "").uniquePerson()
        concurrentTestQueue.async {
            phoneBook.addPerson(person)
        }
    }

    sleep(10)
    print(phoneBook.nameToPersonMap.count)
}

To test my code I run 1000 concurrent threads that simply add a new Person to the PhoneBook. Each Person is unique so after the 1000 threads complete I am expecting the PhoneBook to contain a count of 1000. Everytime I perform a write I perform a dispatch_barrier call, update the hash tables, and return. To my knowledge this is all we need to do; however, after repeated runs of the 1000 threads I get the number of entries in the PhoneBook to be inconsistent and all over the place:

Phone Book Entries: 856
Phone Book Entries: 901
Phone Book Entries: 876
Phone Book Entries: 902
Phone Book Entries: 912

Can anyone please help me figure out what is going on? Is there something wrong with my locking code or even worse something wrong with how my test is constructed? I am very new to this multi-threaded problem space, thanks!

Elum answered 5/3, 2018 at 0:24 Comment(3)
I think your problem is that sleep(10) is a race condition. In a real-world application that needed to wait until all work was done, you wouldn't use that. Try sleep(30) as an experiment and see if your results improve.Worldling
How would I say wait until all work is done in a real-life scenario? I too feel sketch about the sleep. The sleep(30) did not work either..Elum
By the way, I assume this model was just to explore thread safety, but I might suggest a slightly different structure, namely an array of [Person] and then have methods that return arrays of people for a given phone number or given name (e.g. using filter). Many of us have multiple contacts with the same phone number (e.g. a shared business line or even families with the same home phone). I even have a few entries for different people with the same name (e.g. I have two different “Paul Williams” who are really different people). Do whatever you want, but just a thought.Zolner
Z
26

The problem is your ReaderWriterLock. You are saving the writeClosure as a property, and then asynchronously dispatching a closure that calls that saved property. But if another exclusiveWrite came in during the intervening period of time, your writeClosure property would be replaced with the new closure.

In this case, it means that you can be adding the same Person multiple times. And because you're using a dictionary, those duplicates have the same key, and therefore don't result in you're seeing all 1000 entries.

You can actually simplify ReaderWriterLock, completely eliminating that property. I’d also make concurrentRead a generic, returning the value (just like sync does), and rethrowing any errors (if any).

public class ReaderWriterLock {
    private let queue = DispatchQueue(label: "com.domain.app.rwLock", attributes: .concurrent)
    
    public func concurrentlyRead<T>(_ block: (() throws -> T)) rethrows -> T {
        return try queue.sync {
            try block()
        }
    }
    
    public func exclusivelyWrite(_ block: @escaping (() -> Void)) {
        queue.async(flags: .barrier) {
            block()
        }
    }
}

A couple of other, unrelated observations:

  1. By the way, this simplified ReaderWriterLock happens to solves another concern. That writeClosure property, which we've now removed, could have easily introduced a strong reference cycle.

    Yes, you were scrupulous about using [weak self], so there wasn't any strong reference cycle, but it was possible. I would advise that wherever you employ a closure property, that you set that closure property to nil when you're done with it, so any strong references that closure may have accidentally entailed will be resolved. That way a persistent strong reference cycle is never possible. (Plus, the closure itself and any local variables or other external references it has will be resolved.)

  2. You're sleeping for 10 seconds. That should be more than enough, but I'd advise against just adding random sleep calls (because you never can be 100% sure). Fortunately, you have a concurrent queue, so you can use that:

    concurrentTestQueue.async(flags: .barrier) { 
        print(phoneBook.count) 
    }
    

    Because of that barrier, it will wait until everything else you put on that queue is done.

  3. Note, I did not just print nameToPersonMap.count. This array has been carefully synchronized within PhoneBook, so you can't just let random, external classes access it directly without synchronization.

    Whenever you have some property which you're synchronizing internally, it should be private and then create a thread-safe function/variable to retrieve whatever you need:

    public class PhoneBook {
    
        private var nameToPersonMap = [String: Person]()
        private var phoneNumberToPersonMap = [String: Person]()
    
        ...
    
        var count: Int {
            return readWriteLock.concurrentlyRead {
                nameToPersonMap.count
            }
        }
    }
    
  4. You say you're testing thread safety, but then created PhoneBook with .none option (achieving no thread-safety). In that scenario, I'd expect problems. You have to create your PhoneBook with the .threadSafe option.

  5. You have a number of strongSelf patterns. That's rather unswifty. It is generally not needed in Swift as you can use [weak self] and then just do optional chaining.

Pulling all of this together, here is my final playground:

PlaygroundPage.current.needsIndefiniteExecution = true

public class Person {
    public let name: String
    public let phoneNumber: String
    
    public init(name: String, phoneNumber: String) {
        self.name = name
        self.phoneNumber = phoneNumber
    }
    
    public static func uniquePerson() -> Person {
        let randomID = UUID().uuidString
        return Person(name: randomID, phoneNumber: randomID)
    }
}

extension Person: CustomStringConvertible {
    public var description: String {
        return "Person: \(name), \(phoneNumber)"
    }
}

public enum ThreadSafety { // Changed the name from Qos, because this has nothing to do with quality of service, but is just a question of thread safety
    case threadSafe, none
}

public class PhoneBook {
    
    private var threadSafety: ThreadSafety
    private var nameToPersonMap = [String: Person]()        // if you're synchronizing these, you really shouldn't expose them to the public
    private var phoneNumberToPersonMap = [String: Person]() // if you're synchronizing these, you really shouldn't expose them to the public
    private var readWriteLock = ReaderWriterLock()
    
    public init(_ threadSafety: ThreadSafety) {
        self.threadSafety = threadSafety
    }
    
    public func personByName(_ name: String) -> Person? {
        if threadSafety == .threadSafe {
            return readWriteLock.concurrentlyRead { [weak self] in
                self?.nameToPersonMap[name]
            }
        } else {
            return nameToPersonMap[name]
        }
    }
    
    public func personByPhoneNumber(_ phoneNumber: String) -> Person? {
        if threadSafety == .threadSafe {
            return readWriteLock.concurrentlyRead { [weak self] in
                self?.phoneNumberToPersonMap[phoneNumber]
            }
        } else {
            return phoneNumberToPersonMap[phoneNumber]
        }
    }
    
    public func addPerson(_ person: Person) {
        if threadSafety == .threadSafe {
            readWriteLock.exclusivelyWrite { [weak self] in
                self?.nameToPersonMap[person.name] = person
                self?.phoneNumberToPersonMap[person.phoneNumber] = person
            }
        } else {
            nameToPersonMap[person.name] = person
            phoneNumberToPersonMap[person.phoneNumber] = person
        }
    }
    
    var count: Int {
        return readWriteLock.concurrentlyRead {
            nameToPersonMap.count
        }
    }
}

// A ReaderWriterLock implemented using GCD concurrent queue and barriers.

public class ReaderWriterLock {
    private let queue = DispatchQueue(label: "com.domain.app.rwLock", attributes: .concurrent)
    
    public func concurrentlyRead<T>(_ block: (() throws -> T)) rethrows -> T {
        return try queue.sync {
            try block()
        }
    }
    
    public func exclusivelyWrite(_ block: @escaping (() -> Void)) {
        queue.async(flags: .barrier) {
            block()
        }
    }
}


for _ in 0 ..< 5 {
    let iterations = 1000
    let phoneBook = PhoneBook(.threadSafe)
    
    let concurrentTestQueue = DispatchQueue(label: "com.PhoneBookTest.Queue", attributes: .concurrent)
    for _ in 0..<iterations {
        let person = Person.uniquePerson()
        concurrentTestQueue.async {
            phoneBook.addPerson(person)
        }
    }
    
    concurrentTestQueue.async(flags: .barrier) {
        print(phoneBook.count)
    }
}

Personally, I'd be inclined to take it a step further and

  • move the synchronization into a generic class; and
  • change the model to be an array of Person object, so that:
    • The model supports multiple people with the same or phone number; and
    • You can use value types if you want.

For example:

public struct Person {
    public let name: String
    public let phoneNumber: String
    
    public static func uniquePerson() -> Person {
        return Person(name: UUID().uuidString, phoneNumber: UUID().uuidString)
    }
}

public struct PhoneBook {
    
    private var synchronizedPeople = Synchronized([Person]())
    
    public func people(name: String? = nil, phone: String? = nil) -> [Person]? {
        return synchronizedPeople.value.filter {
            (name == nil || $0.name == name) && (phone == nil || $0.phoneNumber == phone)
        }
    }
    
    public func append(_ person: Person) {
        synchronizedPeople.writer { people in
            people.append(person)
        }
    }
    
    public var count: Int {
        return synchronizedPeople.reader { $0.count }
    }
}

/// A structure to provide thread-safe access to some underlying object using reader-writer pattern.

public class Synchronized<T> {
    /// Private value. Use `public` `value` computed property (or `reader` and `writer` methods)
    /// for safe, thread-safe access to this underlying value.
    
    private var _value: T
    
    /// Private reader-write synchronization queue
    
    private let queue = DispatchQueue(label: Bundle.main.bundleIdentifier! + ".synchronized", qos: .default, attributes: .concurrent)
    
    /// Create `Synchronized` object
    ///
    /// - Parameter value: The initial value to be synchronized.
    
    public init(_ value: T) {
        _value = value
    }
    
    /// A threadsafe variable to set and get the underlying object, as a convenience when higher level synchronization is not needed        
    
    public var value: T {
        get { reader { $0 } }
        set { writer { $0 = newValue } }
    }
    
    /// A "reader" method to allow thread-safe, read-only concurrent access to the underlying object.
    ///
    /// - Warning: If the underlying object is a reference type, you are responsible for making sure you
    ///            do not mutating anything. If you stick with value types (`struct` or primitive types),
    ///            this will be enforced for you.
    
    public func reader<U>(_ block: (T) throws -> U) rethrows -> U {
        return try queue.sync { try block(_value) }
    }
    
    /// A "writer" method to allow thread-safe write with barrier to the underlying object
    
    func writer(_ block: @escaping (inout T) -> Void) {
        queue.async(flags: .barrier) {
            block(&self._value)
        }
    }
}
Zolner answered 15/3, 2018 at 18:44 Comment(8)
Simply amazing answer. Thank you so much this shed a lot of light for me on whats going on... Honestly, it was a rookie mistake for me to not see the closure issue :P Thanks again.Elum
Accepted the answerElum
Awesome answer! Any reason to not declare Synchronized as a struct instead of a class?Summertime
@Summertime - Yeah, because because of writer: You can't have mutating method in struct that performs the mutation asynchronously.Zolner
@rob: Ah, that makes sense. Thanks for the clarification.Summertime
@Zolner any reason why you have provided access to _value both via a computed property and method in the synchronized class?Prudenceprudent
@Prudenceprudent Yes. Sometimes all you need is a synchronized accessor method, in which case the accessors result in much more natural code. E.g., maybe you’re just resetting a synchronized integer. foo.value = 0 feels a heck of a lot natural than foo.writer { $0 = 0 } or foo.writer { value in value = 0 }. But let’s say I was incrementing the value, I can’t use the accessor (because incrementing is a multi-step operation, reading the value, incrementing it, and writing it) and foo.value += 1 is not thread safe, as innocent as it might seem. You have to foo.writer { $0 += 1 } or equivalent.Zolner
But if you find providing both is confusing, then eliminate the computed property and lose its cleaner syntax. (But you do not, IMHO, want to eliminate the method, because more often than not, you really do need that higher level of synchronization to achieve thread safety.)Zolner
A
0

You might use a NSCache class. The documentation claims that it's thread safe:

You can add, remove, and query items in the cache from different threads without having to lock the cache yourself.

Here is an article that describes quite useful tricks related to the NSCache

Amarillis answered 3/9, 2022 at 11:59 Comment(0)
S
-3

I don’t think you are using it wrong :).

The original (on macos) generates:

0  swift                    0x000000010c9c536a PrintStackTraceSignalHandler(void*) + 42
1  swift                    0x000000010c9c47a6 SignalHandler(int) + 662
2  libsystem_platform.dylib 0x00007fffbbdadb3a _sigtramp + 26
3  libsystem_platform.dylib 000000000000000000 _sigtramp + 1143284960
4  libswiftCore.dylib       0x0000000112696944 _T0SSwcp + 36
5  libswiftCore.dylib       0x000000011245fa92 _T0s24_VariantDictionaryBufferO018ensureUniqueNativeC0Sb11reallocated_Sb15capacityChangedtSiF + 1634
6  libswiftCore.dylib       0x0000000112461fd2 _T0s24_VariantDictionaryBufferO17nativeUpdateValueq_Sgq__x6forKeytF + 1074

If you remove the ‘.concurrent’ from your ReaderWriter queue, "the problem disappears”.© If you restore the .concurrent, but change the async invocation in the writer side to be sync:

swift(10504,0x70000896f000) malloc: *** error for object 0x7fcaa440cee8: incorrect checksum for freed object - object was probably modified after being freed.

Which would be a bit astonishing if it weren’t swift? I dug in, replaced your ‘string’ based array with an Int one by interposing a hash function, replaced the sleep(10) with a barrier dispatch to flush any laggardly blocks through, and that made it more reproducibly crash with the somewhat more helpful:

x(10534,0x700000f01000) malloc: *** error for object 0x7f8c9ee00008: incorrect checksum for freed object - object was probably modified after being freed.

But when a search of the source revealed no malloc or free, perhaps the stack dump is more useful.

Anyways, best way to solve your problem: use go instead; it actually makes sense.

Swob answered 5/3, 2018 at 19:29 Comment(3)
Removing ".concurrent" makes the problem disappear because then you are leveraging a serial qeue. A serial queue by default need not use a ReadWrite Lock because threads are executed serially one at a time, thats why it just "works." Also, u cant synchronously dispatch with a barrier thats just not how the API works, hence why ur getting the crash. I dont understand your solutions, it does not suffice to tell me to use "Go." The question aims to help me get this working in Swift.Elum
I think you had good intentions but unfortunately people have downvoted. I will have to as well.Elum
Removing .concurrent not only changes it so it’s no longer a reader-writer (thereby losing the benefits that this confers), but it does not actually solve the root problem, either. Take his original code, make the queue serial and the problem will persist. The problem is the closure property that is called asynchronously. I would bet that when you were tidying up his code, that you removed that unnecessary property, and this is what really eliminated the problem.Zolner

© 2022 - 2024 — McMap. All rights reserved.