as mentioned, default is to use a tlab. The behavious is described in this glossary as follows
TLAB
Thread-local allocation buffer. Used to allocate heap space quickly without synchronization. Compiled code has a "fast path" of a few instructions which tries to bump a high-water mark in the current thread's TLAB, successfully allocating an object if the bumped mark falls before a TLAB-specific limit address.
Further details on sizing in this blog & all the details you could want in this blog.
In short it's thread local unless the TLAB is full in which case you'll need to hit the shared pool and this is a CAS operation.
Another complicating factor could be this bug that describes false sharing in card marking which is not a lock as such but will hurt performance (if this is why you're asking about locking). It looks like this is fixed in java7 though.