A (program) transformation is a function that given a program representation instance, computes another.
The program representation can be arbitrary, but is often an abstract syntax tree (AST) or a graph (e.g., UML); you can even include byte-code as a program representation. Like mathematical functions, transformations can be "partial" (that is, only work under some [possibly complicated] conditions).
I personally like the terminology transform to refer to the function itself, and transformation to refer to the act or result of applying the transform to get a new representation.
In general, a (global) program transformation may affect the entire representation even if it is huge, but typically individual transformations only modify a small part leaving most of the program representation alone. Abstractly,
the entire program representation instance is processed by the transformation to produce another entirely new program representation instance. Since representation instances tend to be big, this is often implemented by having the transformation simply modify the existing representation instance.
Such "small" transforms you can think of as having additional parameters that focus them on the specific part of the representation to which they will make a change.
Like mathematical functions, transformations compose to produce "bigger" transformations (that are also partial, as the conditions compose too). Usually you write a set of transformations to transform a program representation in its entirety, since no one transformation will process the entire representation instance in one step. The fact that you can compose them allows you to write lots of "little" transformations which collectively achieve your purpose, so you get a kind of modularity in the semantic translation, which is why people like the idea of program transformations.
Like mathematical functions, you can implement such transformations by writing procedural code. Such code inspects bits of the original model, and produce changes to the model as they run, but this is generally awkward.
So such transformations are instead often written in a so-called "declarative" form as rules which contain a pairs of patterns and a condition. Each rule is interpreted as, "if you see the left side pattern, and the condition matches, then change the program representation to match the right side pattern". Pattern variables allow the pattern to designate chunks of the original program representation to pass through the transformation untouched (normally to be processed by some other transformation). While these rules are called "declarative" (because they don't look like conventional code), they just represent some equivalent functions and so are not declarative in the intended sense. Rules tend to be a lot more readable than the equivalent procedural code, often because the patterns are written in the surface syntax of the source and target representations.
As a practical matter, individual transformations only apply to particular places in the representation, and the order in which they are applied ("compose") matters. To handle this, (program) transformation tools often provide a way to "metaprogram" to control the point of focus and the rule application order.
These ideas apply to so-called "model-driven development" which is just transformations applied to an arguably high-level model to generate low-level code, or to transform low-level code to other low-level code. You can even use these ideas to build reverse-engineering tools, e.g., map low-level code to some abstract model. Our DMS Software Reengineering Toolkit is a program transformation tool, have both procedural transforms and source-to-source rewrites, used for all of those purposes.