The two terms aren't clearly defined, and it's not a black/white thing.
Memory models can be extremely weak, extremely strong, or anywhere in between.
It basically refers to the guarantees offered about concurrent memory accesses.
Naively, you would expect a write made on one thread, to be immediately visible to all other threads. And you would expect events to appear in the same order on all threads as well.
But in a weaker memory model, neither of those may hold.
Sequential consistency is the term for a memory model which guarantees that events are seen in the same order across all threads. So a memory model which ensures sequential consistency is pretty strong.
A weaker guarantee is causal consistency: the guarantee that events are observed after the events they depend on.
In other words, if you first write a value x
to some address A
, and then write a second value y
to the same address, then no thread will ever read the value y
after reading the x
value. Because the two writes are to the same address, it would violate causal consistency if not all threads observed the same order.
But this says nothing about what should happen to unrelated events. The result of writing a third value to a different memory address could be observed at absolutely any time by other threads (so different threads may observe events in a different order, unlike under sequential consistency)
There are plenty other such levels of "consistency", some stronger, some weaker, and offering all sorts of subtle guarantees about what you can rely on.
Fundamentally, a stronger memory model is going to offer more guarantees about the order in which events are observed, and will normally guarantee behavior closer to what you'd intuitively expect.
But a weaker model allows more room for optimization, and especially, it scales better with more cores (because less synchronization is required)
Sequential consistency is basically free on a single-core CPU, is doable on a quad-core, but would be prohibitively expensive on a 32-core system, or a system with 4 physical CPUs. Or a shared-memory system between multiple physical machines.
The more cores you have, and the further apart they are, the harder it is to ensure that they all observe events in the same order. So compromises are made, and you settle for a weaker memory model which makes looser guarantees.