I'm newbie programmer building a startup that I (naturally) hope will create a large amount of traffic. I am hosting my django project on dotcloud, which is on Amazon EC2. I have some streaming media (Http though, not rmtp) so the dotcloud guys recommended I go with a CDN. I am also using Amazon S3 for storage and so decided to go with Amazon CloudFront as my CDN.
The time has come where I need to turn my attention to caching and I am lost and confused. I am completely new to the concept. The entire extent of my knowledge comes from a tutorial I just read (http://www.mnot.net/cache_docs/) and a confusing weekend spent consulting google. Most troubling of all is that I am not even sure what I need to do for my site.
What is the difference between a CDN and a proxy server?
Is it possible I might want to use a caching service (e.g. memcached, redis), a CDN (CloudFront), AND a proxy server (squid)?
Our site is DB driven and produces dynamically generated lists specific to user locations. Can such a site be cached? (The lists themselves are filterable via AJAX, so the URL might remain the same while producing largely different results. For instance, example.com/some_url/ might generate a list of 40 objects, but only 10 appearing on the page. By clicking on a filter, the user could end up with 10 different objects while still at /some_url/)
What are the best practices for a high traffic, rich content site?
How can I learn about this? Everywhere I look seems to take for granted some basics that I just don't have as a part of my own foundation yet.
I'm not certain I'm asking the right questions. Just feeling very lost. I've now built 95% of my entire site and thought I was just ironing out the details but caching seems like another major undertaking. Any guidance/advice/encouragement would be much appreciated!