Recommended structure for high traffic website [closed]
Asked Answered
K

5

13

I'm rewriting a big website, that needs very solid architecture, here are my few questions, and pardon me for mixing apples and oranges and probably kiwi too:) I did a lot of research and ended up totally confused.

Main question: Which approach would you take in building a big website expected to grow in every way?

  1. Single entry point, pages data in the database, pulled by associating GET variable with database entry (?pageid=whatever)

  2. Single entry point, pages data in separate files, included based on GET variable (?pageid=whatever would include whatever.php)

  3. MVC (Alright guys, I'm all for it, but can't grasp the concept besides checking all tutorials and frameworks out there, do they store "view" in database? Seems to me from examples that if you have 1000 pages of same kind they can be shaped by 1 model, but I'll still need to have 1000 "views" files?)

  4. PAC - this sounds even more logical to me, but didn't find much resources - if this is a good way to go, can you recommend any books or links?

  5. DAL/DAO/DDD - i learned about these terms by diligently reading through stack overflow before posting question. Not sure if it belongs to this list

  6. Sit down and create my own architecture (likely to do if nobody enlightens me here:)

  7. Something not mentioned...

Thanks.

Krucik answered 30/11, 2010 at 17:22 Comment(5)
I am a big fan of MVC design pattern, here is a tutorial that I think will clarify some of the questions you have. php-html.net/tutorials/model-view-controller-in-phpWhittle
If you are planning to make your own architecture, give me a call =D After being sourly disappointed with Drupal I've been considering making something with more power. If anyone out there is a Drupal fan, feel free to contact me as well. I'll gladly share my bad experiences. If you'd rather figure out my problem first hand, try to create a content type for a table with variable columns.Sulfanilamide
All these things you mentioned here has nothing to do with handling high traffic. You can choose whatever you wish, though some of the points are just lame. Also keep in mind that 99% of people who says a word "MVC" here, have not a slightest idea of what it is.Titan
Just because MVC isn't native to PHP and implementations vary, doesn't mean it isn't a good idea. Abstracting the view, especially, is a monumentally good idea. A close second is the usefulness of abstracting access to and logic operating on your data.Splitting
MVC is a way of doing code. It does not have to do with the language itself rather it is a way you right your code in order to be executed. You can right using MVC pattern with any language.Posting
D
7

Scalability/availability (iow. high-traffic) for websites is best addressed by none of the items you mention. Especially points 1 and 2; storing the page definitions in a database is an absolute no-no. MVC and other similar patterns are more for code clarity and maintenance, not for scalability.

An important piece of missing information is what kind of concurrent hits/sec are you expecting? Sometimes, people who haven't built high-traffic websites are surprised at the hit rates that actually constitute a "scalability nightmare".

There are books on how to design scalable architectures, so an SO post will not be able to the topic justice, but some very top-level concepts, in no particular order, are:

  • Scalability is best handled first by looking at hardware-based solutions. A beefy server with an array of SSD disks can go a long way.
  • Make static anything that can be static. Serve as much as you can from the web server, not the DB. For example, a lot of pages on websites dynamically generate data lists out of databases from data stores that very rarely or never really change.
  • Cache output that changes infrequently, and tune the cache refresh.
  • Build dynamic pages to be stateless or asynchronous. Look into CQRS and Event Sourcing for patterns that favor/facilitate scaling.
  • Tune your queries. The DB is usually the big bottleneck since it is a shared resource. Lots of web app builders use ORMs that create poor queries.
  • Tune your database engine. Backups, replication, sweeping, logging, all of these require just a little bit of resource from your engine. Tuning it can lead to a faster DB that buys you time from a scale-out.
  • Reduce the number of HTTP requests from clients. Each HTTP connect has overhead. Check your pages and see if you can increase the payload in each request so as to reduce the overall number of individual requests.

At this point, you've optimized the behavior on one server, and you have to "scale out". Now, things get very complicated very fast. Load-balancing scenarios of various types (sharding, DNS-driven, dumb balancing, etc), separating read data from write data on different DBs, going to a virtualization solution like Google Apps, offload static content to a big CDN service, use a language like Erlang or Scala and parallelize your app, etc...

Despair answered 30/11, 2010 at 21:29 Comment(6)
re #4, one cannot overemphasize the merits of memcached and, secondly, APC which helps address #2 (avoiding recompiles, etc)Splitting
Actually, in #2, the technique involves specifically and fully bypassing active caching subsystems. I've built sites that have fully static pages from database data where the creation of that page is triggered by updates to the table. The table is never read from, except to create the related page. If the table isn't updated for months, the related page is untouched. Caches always involve some amount of "polling", which are just more resource hits.Despair
Wow! I really have to thank each and every one of you guys who posted here, so much really fantastic and useful info. I have a question for alphadogg, in relation to "storing the page definitions in a database is an absolute no-no" - does that include html code too? As the site I'm working on will have CMS, with kind of WYSIWYG editor, meaning some html tags would be there.Krucik
By "page definitions", I meant HTML. If you store the stuff in a given mypage.htm, and all associated mypage.css, mypage1.inc, mypage2.inc, etc., pages as records in a database, you have taken a step backwards in scalability. Caching is the real-time way to avoid going to the DB. But, even caching still has overhead and resource consumption. Make as much static as you can, even if it means that sometimes, you have to manipulate database data into a static page.Despair
For example, in a system I built, there was a frequently-accessed report. The underlying data was refreshed in a batch every night. Part of the job was to spit out a static HTML report, instead of having users dynamically access the data. Now, you could have cached the first access and forced a cache refresh nightly. Or, I avoided the cache entirely by thinking a little about how that data was accessed. That way, the report could actually be distributed via a CDN, for example.Despair
BTW, that system was not MVCed, DDDed or IOCed, which is taken as religion these days. None of those would have made it noticeably faster. It was good procedural code. Rethinking the real bottleneck did.Despair
P
2

Single entry point, pages data in the database, pulled by associating GET variable with database entry (?pageid=whatever)

Potential nightmare for maintenance. And also for development if you have team of more than 2-3 people. You would need to create a set of strict rules for everyone to adhere to - effort that would be much better spent if using MVC. Same goes for 2.

MVC (Alright guys, I'm all for it, but can't grasp the concept besides checking all tutorials and frameworks out there, do they store "view" in database? Seems to me from examples that if you have 1000 pages of same kind they can be shaped by 1 model, but I'll still need to have 1000 "views" files?)

It depends how many page layouts are there. Most MVC frameworks allow you to work with structured views (i.e. main page views, sub-views). Think of a view as HTML template for the web page. How many templates and sub-templates inside you need is exactly how many view's you'll have. I believe most websites can get away with up to 50 main views and up to 100 subviews - but those are very large sites. Looking at some sites I run, it's more like 50 views in total.

DAL/DAO/DDD - i learned about these terms by diligently reading through stack overflow before posting question. Not sure if it belongs to this list

It does. DDD is great if you need meta-views or meta-models. Say, if all your models are quite similar in structure, but differ only in database tables used and your views almost map 1:1 to models. In that case, it is a good time for DDD. A good example is some ERP software where you don't need a separate design for all the database tables, you can use some uniform way to do all the CRUD operations. In this case you could probably get away with one model and a couple of views - all generated dynamically at run-time using meta-model that maps database columns, types and rules to logic of programming language. But, please note that it does take some time and effort to build a quality DDD engine so that your application doesn't look like hacked-up MS Access program.

Sit down and create my own architecture (likely to do if nobody enlightens me here:)

If you're building a public-facing website, you're most likely going to do it well with MVC. A very good starting point is to look at CodeIgniter video tutorials. It helped me understand what MVC really is and how to use it way better than any HOWTO or manual I read. And they only take 29minutes altogether:

http://codeigniter.com/tutorials/

Enjoy.

Phosphoroscope answered 30/11, 2010 at 21:28 Comment(2)
MVC and DDD are not directly related to the ability to handle high traffic.Despair
To quote the OP: a big website expected to grow in every way. This means that maintenance is important. Both MVC and DDD are directly related to that. Ability to handle high traffic is completely different and one should look into stuff like PageSpeed, YSlow, PHP-APC, nginx, load-balancing, mysql optimization, etc. but that is not what OP is asking. Maybe he should change the title of the question a little bit.Isleana
G
1

I'm a fan of MVC because I've found it easier to scale your team when everything has a place and is nice and compartmentalized. It takes some getting used to, but the easiest way to get a handle on it is to dive in.

That said definitely check your local library to see if they have the O'Reilley book on scaling: http://oreilly.com/catalog/9780596102357 which is a good place to start.

Gambrill answered 30/11, 2010 at 17:43 Comment(0)
P
1

If you're creating a "big" website and don't fully grasp MVC or a web framework then a CMS might be a better route since you can expand it with plugins as you see fit. With this route you can worry more about the content and page structure rather than the platform. As long as you pick the appropriate CMS.

Perforated answered 30/11, 2010 at 17:49 Comment(3)
CMS is hardly a good idea for high traffic websites.Isleana
Disagree. backendbattles.com/backend/drupalPerforated
If caching can be implemented in CMS, probably it can scale well.Herrle
K
1

I would suggest to create a mock app with some of the web mvc frameworks in the wild and pick one, with which your development was smooth enough. Establishing your code on a solid basis is fundamental, if you want to grasp concepts of mvc and be ready to add new functionality to your web easily.

Kirman answered 30/11, 2010 at 18:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.