I think you didn't get the idea of Requests vs Limits, I would recommend you take a look on the docs before you take that decision.
In a brief explanation,
Request is how much resource will be virtually allocated to the container, it is a guarantee that you can use it when you need, does not mean it keeps reserved exclusively to the container. With that said, if you request 200mb of RAM but only uses 100mb, the other 100mb will be "borrowed" by other containers when they consume all their Requested memory, and will be "claimed back" when your container needs it.
Limit is simple terms, is how much the container can consume, requested + borrow from other containers, before it is shutdown for consuming too much resources.
- If a Container exceeds its memory limit, it will probably be terminated.
- If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.
In simple terms, the limit is an absolute value, it should be equal or higher than the request, and the good practice is to avoid having the limits higher than the request for all containers, only in cases while certain workloads might need it, this is because most of the containers can consume more resources (ie: memory) than they requested, suddenly the PODs will start to be evicted from the node in an unpredictable way that makes it worse than if had a fixed limit for each one.
There is also a nice post in the docker docs about resources limits.
The scheduling rule is the same for CPU and Memory, K8s will only assign a POD to a the node if the node has enough CPU and Memory allocatable to fit all resources requested by the containers within a pod.
The execution rule is a bit different:
Memory is a limited resource in the node and the capacity is an absolute limit, the containers can't consume more than the node have capacity.
The CPU on the other hand is measure as CPU time, when you reserve a CPU capacity, you are telling how much CPU time a container can use, if the container need more time than the requested, it can be throttled and go to an execution queue until other containers have consumed their allocated time or finished their work. In summary is very similar to memory, but is very unlikely the container being killed for consuming too much CPU. The container will be able to use more CPU when the other containers does not use the full CPU time allocated to them. The main issue is when a container uses more CPU than was allocated, the throttling will degrade de performance of the application and at certain point might stop working properly. If you do not provide limits, the containers will start affecting other resources in the node.
Regarding the values to be used, there is no right value or right formula, each application requires a different approach, only measuring multiple times you can find the right value, the advice I give to you is to identify the min and the max and adjust somewhere in the middle, then keep monitoring to see how it behaves, if you feel is wasting\lacking resources you can reduce\increase to an optimal value. If the service is something crucial, start with higher values and reduce afterwards.
For readiness check, you should not use it as parameters to specify these values, you can delay the readiness using initialDelaySeconds
parameter in the probe to give extra time to start the POD containers.
PS: I quoted the terms "Borrow" and "Claimed back" because the container is not actually borrowing from another container, in general, the node have a pool of memory and give you chunk of the memory to the container when they need it, so the memory is not technically borrowed from the container but from the Pool.