Byte: Today, a byte is almost always 8 bit. However, that wasn't always the case and there's no "standard" or something that dictates this. Since 8 bits is a convenient number to work with it became the de facto standard.
Word: The natural size with which a processor is handling data (the register size). The most common word sizes encountered today are 8, 16, 32 and 64 bits, but other sizes are possible. For examples, there were a few 36 bit machines, or even 12 bit machines.
The byte is the smallest addressable unit for a CPU. If you want to set/clear single bits, you first need to fetch the corresponding byte from memory, mess with the bits and then write the byte back to memory.
By contrast, one definition for word is the biggest chunk of bits with which a processor can do processing (like addition and subtraction) at a time – typically the width of an integer register. That definition is a bit fuzzy, as some processors might have different register sizes for different tasks (integer vs. floating point processing for example) or are able to access fractions of a register. The word size is the maximum register size that the majority of operations work with.
There are also a few processors which have a different pointer size: for example, the 8086 is a 16-bit processor which means its registers are 16 bit wide. But its pointers (addresses) are 20 bit wide and were calculated by combining two 16 bit registers in a certain way.
In some manuals and APIs, the term "word" may be "stuck" on a former legacy size and might differ from what's the actual, current word size of a processor when the platform evolved to support larger register sizes. For example, the Intel and AMD x86 manuals still use "word" to mean 16 bits with DWORD
(double-word, 32 bit) and QWORD
(quad-word, 64 bit) as larger sizes. This is then reflected in some APIs, like Microsoft's WinAPI.