I have found the answers and glad to share:
What is optimal/non-optimal substructure? A problem is said to have optimal substructure if an optimal solution can be constructed efficiently from optimal solutions of its subproblems. This property is used to determine the usefulness of dynamic programming and greedy algorithms for a problem
What is local and global optimum? Local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions.
Global optimum - is the optimal solution among all possible solutions, not just those in a particular neighborhood of values.
How to prove that Greedy algorithm yields global optimum?
Usually, global optimum can be proven by induction. Typically, a greedy algorithm is used to solve a problem with optimal substructure if it can be proved by induction that this is optimal at each step. Otherwise, providing the problem exhibits overlapping subproblems as well, dynamic programming is used.
To prove that an optimization problem can be solved using a greedy algorithm, we need to prove that the problem has the following:
Optimal substructure property: an optimal global solution contains the optimal solutions of all its subproblems.
Greedy choice property: a global optimal solution can be obtained by greedily selecting a locally optimal choise.
Matroids can be used as well in some case used to mechanically prove that a particular problem can be solved with a greedy approach.
And finally, some good examples of greedy algorithms.