Depending on what your dictionary looks like, you might not need the locks at all, if you can get each thread to build independent subtrees. If this isn't an online algorithm, presort the words by prefix (first letter if you have < 26 threads, first and second if you have more or you know that the data isn't balanced, for example, 90% of the words start with A). Basically, this would be an O(n) operation, where you do one pass to count how many words there are that start with a given letter, then one pass to sort (along the lines of a radix sort by your chosen prefix). Then, divide the prefixes between the threads and have each thread build these independent sub trees. Finally, have one thread add each of these subtrees to the root. I'll walk through an example below.
Your dictionary:
Bark
Apple
Cookie
And
Baby
Corn
Blue
Cake
Bacon
After sorting:
Apple
And
Bark
Baby
Blue
Bacon
Corn
Cookie
Cake
Then we divide prefixes among the threads. For this example we have 3 threads, which get the prefixes [A][B][C], and build the following trees:
A --| B -------| C -------|
P N |-- A ---| L O ---| A
P D R B C U O R K
L K Y O E K N E
E N I
E
And then you have one thread combine these at the root like:
|----------- Root------------------|
A --| B -------| C -------|
P N |-- A ---| L O ---| A
P D R B C U O R K
L K Y O E K N E
E N I
E
I hope that made sense.
Advantages to this method:
The threads work essentially independently, and you don't have the overhead of having to deal with acquiring and releasing locks.
Drawbacks to this method:
If you don't know anything about the dictionary, there may be a severe workload imbalance, and in the worst case (Say, all the words start with 'A') it reverts to basically being a single thread building a tree. There are a few ways to make this better, for example, you can add some checks when you sort such that if the workload is severely imbalanced when dealing with a single letter prefix, to resort by the first 2 letters, but really you can't guarantee that it will be balanced.
You also may have idle threads at times if say you have 20 threads and sort by first letter, then you there will be some 6 threads that have to do two subtrees, while 14 of them sit idle for half the time. You may be able to further subdivide the subtrees to deal with this, but that's extra time spent on a preprocessing step.
Anyway, no guarantees that this is faster than your method, but its something to consider.