The O(n log n) lower bound is for comparison sorts, i.e. the elements in the array can only be compared with each other to check whether one should be before or after the other, or if they are equal. That is a good model for general sorting algorithms because it works for pretty much any type of data you might want to sort; numbers, strings, instances of user-defined classes, and so on. It could even just be a data type that can be mapped by a key function to some other data type supporting comparisons; or you could accept a comparator function to do the comparisons with.
Note that O(n log n) here is a lower bound on the number of comparisons, not the running time. If the comparisons take more than O(1) time each, say because you're comparing long strings that have long common prefixes, then the running time would be like O(cn log n) where comparisons are done in O(c) time. Comparing strings of length w takes O(w) time in the worst case, for example.
If you only need a sorting algorithm for a specific type of data, then you may be able to do better, because other operations specific to that data type can be performed on the elements. For example, when sorting integers, you can use the array elements to index another array, giving the counting sort algorithm which runs in O(n + r) time where r is the range of the array elements.
If the sorting keys are like strings, in the sense that they are (or can be mapped to) sequences such that comparing keys is equivalent to lexicographically comparing those sequences, then indeed you can use a trie to sort an array containing that data type. Congratulations: you have independently reinvented the radix sort algorithm, which can be implemented using tries. Its running time is O(wn), not O(n), because it takes O(w) time to insert a string of length w into the trie and you have to do that n times.
So, if the elements aren't strings, or "string-like" in the above sense, then radix sort simply isn't applicable. If the elements are strings or "string-like", then radix sort works but it takes O(wn) time instead of O(cn log n).
This means radix sort isn't strictly better, and is probably worse when the common prefixes of the strings are short relative to the strings themselves, which is often the case. For random strings, regular string comparison takes O(1) time on average, in which case O(n log n) is asymptotically better than radix sort for strings longer than O(log n).
In practical applications, one should also consider the hidden constants in the asymptotic analysis. Comparison sorts like Timsort have low hidden constants because they access array elements sequentially, which results in fewer cache misses compared to walking a tree whose nodes won't be consecutive in memory.