In a single-page app, what is the right way to deal with wrong URLs (404 errors)?
Asked Answered
U

2

62

I am currently writing a web application using angularjs, but I think this question applies to any client-side javascript framework that does routing on the client side (as angular does).

In a single-page app, what is the right way to deal with wrong URLs?

Looking at a few major sites, I see that gmail will redirect to the inbox if you type any random URL below https://mail.google.com/mail/. This happens server-side (with an http 300 code) or client-side, depending on whether the wrong path is before or after the # character. On the other hand, twitter shows a real HTTP 404 for any invalid URL. A third option would be to show a "soft" 404, a purely client-side error page.

These solutions seem appropriate for different situations. Twitter wants the links to twitter users and tweets to be real links, so people can share them, post them in news articles, etc, so it is important that invalid links be recognized as such (if I have a broken link to a tweet in my website, a simple crawl will tell me that). In gmail, on the other hand, you are not expected to share links into your inbox, and I'm not even sure if the links are really permanent/persistent: it seems the url updating mostly serves the purpose of browser history navigation within the single-page app. The third approach of giving soft errors might be appropriate for situations similar to gmail, but where there is no reasonable "default" page.

After this long introduction, here are some specific questions:

  • Is it ever acceptable to give a "soft" error page instead of a 404 error, or should a single-page app always redirect to a real 404 if a url is invalid?
  • Gmail's code may be perfectly bugfree, but if it did have a bug leading to invalid links that end up redirecting back to the inbox, that might be even more confusing for users than an error page. For most web apps out there, that are not as well tested as gmail, would it be better to show an error page?
  • To implement real 404s for single-page apps, it seems necessary to duplicate the routing logic on the server-side. Is there any way around this?
  • When redirecting to a 404, I think the user should be able to see the URL that caused the error, possibly in the URL bar. With the html5 history api, I think this can be accomplished by simply triggering a reload of the current page (with the wrong url), combined with the server-side routing mentioned above. For browsers that do not support this or when using hashbang notation, this does not seem possible. What's the best way to support all browsers?
Urolith answered 8/2, 2013 at 18:39 Comment(8)
Does your website even work without javascript? Are you using history.pushState to update the URLs via javascripts, or segments in the URL?Pundit
Also, why are you talking about redirecting to a 404, why not just show one?Pundit
@markus The site I am currently working on does not work without javascript. But I do want deep-linking to work, so users can share links to inside the site (typically, this would be by email). I am using hashbang notation for now, but angularjs makes it easy to switch to html5 pushState if I want/need to.Urolith
@MarkusUnterwaditzer: about redirecting vs showing a soft 404: that's part of the question. In some cases showing a 404 client-side is fine. But I like the fact that an HTTP 404 has known semantics that an automated tool can understand (for testing, for checking links, etc).Urolith
There is no definitive answer for this. Armin Ronacher wrote an article about the approach used by Battlelog: To render the site server-side first, then use Javascript to render every other click: lucumr.pocoo.org/2011/11/15/modern-web-applications-are-herePundit
In your case i don't think you should take any care of the Google bot anymore, since making your site javascript-only is already excluding the googlebot mostly. The googlebot can interpret Javascript to some extent, but i don't think the bot would recognize 404 pages of a Javascript-only page as such.Pundit
The content in the app will not be visible without authentication, so I don't care about indexing in this specific case (as in the gmail example in a way, but with multiple users sharing an "inbox").Urolith
Well, then in your case just showing a 404-ish message will be enough.Pundit
P
10

If you care about SEO, one of the ways that angular.io was able to solve this problem (at least with Google anyway) is by using noindex meta tag "to indicate soft-404 status which will prevent crawlers from crawling the content of the page". Apparently it can be added to the document via JavaScript.

Alternatively, using JavaScript, you can redirect to a page that will respond with an actual HTTP 404 status code. Google understands JavaScript redirects just fine. Your original /does-not-exist page, when redirected to /404-error?from=does-not-exist, will be associated with the 404 status code returned by the server. The URL structure does not matter, only the status code and the redirect are important here.

Your other options are SSR (Nuxt.js, Next.js, Angular Universal, etc) or pre-rendering (prerender.io, puppeteer, etc) which Google calls dynamic rendering where you respond to search bot requests with a pre-rendered version while human users get your normal client-side rendered app.

Penelope answered 20/11, 2018 at 20:19 Comment(4)
... where you respond to search bot requests with a pre-rendered version while human users get your normal client-side rendered app. Is it OK from the SEO point of view if the user requests e.g. my-app.com/not-existent-path/blah/blah/blah and my server responds with a page having a 404 HTTP status code but then after the 404 page renders the user presses a "Go to homepage" button which when pressed only changes some contents of the page and uses the JS history API without making a new request to the server? i.e., page rendered with 404, after app changed URL through history API.Downwash
That sounds okay. If you change URL with JS History API it doesn't matter wether you are going to fetch new content from server, from cache or somehow else – it has no effect on SEO whatsoever because search bots won't be clicking on your "Go to homepage" link anyway, they will instead make a new request to the URL in that link. JS History API is only for your fellow human users.Penelope
For anybody who stumbles upon this, here is an interesting talk about how to handle soft-404's : youtube.com/watch?v=vjj8B4sq0UI&t=30m15s (31:40 min mark). It's a pres made for the JavaScript fwdays conference and there is an interesting explanation as to why the noindex meta tag might cause unwanted side effects.Existentialism
@Rose, thanks for the video. To be fair, this is only an issue if you add the noindex meta tag within the response itself. However, if you add it via JavaScript then it should not be an issue. To play safe, I would not add the noindex meta tag as a default (as angular.io does) but instead add it only when needed.Penelope
L
5

tl;dr: Drop hashbang support and opt for PJAX like behavior if you care about SEO.

Are you making an App or a Website? If website you need to return 404 so that you don't confuse google. It needs be a real 404 not just show a message of page not found (ie 200 with message "page not found" is very bad). Also what browsers do you care to support?

My opinion is that the whole hashbang server side rendering should be avoided (ie the nasty Google SEO #! hack). Either use real pushstate or re-render the whole page if the URL changes for browsers that don't support pushstate (not a hash change).

Now the reason this matters is that a #! should never return a 404 because it doesn't make sense and its impossible to mimic server side because the server never gets whats after the #! with out running Javascript.

Thus if you really care about SEO I would do something like PJAX and only use true pushstate for routing and then just fail to old web 1.0. Consequently the links I recommend you share that can truly be a 404 should not have #! (traditional # being fine so long as the contents of the page don't change drastically).

Finally the 404 is mostly not a problem but rather 30X ie redirect responses. Thats because the browser will automatically handle redirects so your Javascript AJAX calls will never see a 30X (they will get the redirect response instead... ie 200). To handle 30X responses you will have to send a header back for every request to indicate what the redirected URL is/was (ie what you were redirected to) so that you don't mess up the Pushstate History.

Of course if you need to support hashbang like Twitter used too (and they are the ones that even killed hashbang), you can leverage Google Sitemaps and the rel=nofollow to try to mitigate bad SEO.

Lawmaker answered 9/2, 2013 at 15:57 Comment(4)
PJAX looks interesting for someone building from scratch. But anuglarjs framework supports pushState out of the box, so I guess it would not be needed. Or does PJAX do anything more?Urolith
What I am building now is an app, that will not be indexed by search engines. But I am interested in more generally understanding this issue.Urolith
I was not aware of the problem with pushState and 30x responses. Good to know. Any pointers to docs/examples/tutorials on this?Urolith
Specifically, pjax-container seems to be conceptually the same as angularjs ng-viewUrolith

© 2022 - 2024 — McMap. All rights reserved.