How do I throw a real 404 or 301 with an Angular pushstate URL
Asked Answered
C

3

16

I'm using $routeProvider and $locationProvider to handle pushstate URLS in a single page app (SPA), something like this:

angular.module('pets', [])

  .config(function($routeProvider, $locationProvider) {
    $locationProvider.html5Mode(true);
    $routeProvider.when('/pet/:petId', {
      controller: 'petController'
    });
  })

  .controller('petController', function($scope, petService, $routeParams){
    petService.get('/api/pets/' + $routeParams.petId).success(function(data) {
      $scope.pet = data;
    });
  });

The URL is used to pull content from the server which may or may not exist.

If this was an ordinary multipage website, a request for missing content would trigger a 404 header response from the server, and a request for moved content would trigger a 301. This would alert Google to the missing or moved content.

Say for example I hit a URL like this:

http://example.com/pet/123456

and say there is no such pet in the database, how can my SPA return a 404 on that content.

Failing this, is there some other way to correctly alert the user or search engine that the requested URL doesn't exist? Is there some other solution I'm not considering?

Codification answered 6/1, 2015 at 11:14 Comment(4)
If the server returns a status code outside the 200 range, the get call should reject and be passed to the error callback: .success(cb).error(cb). Is that what you are looking for?Astra
@DavinTryon - not really, I can catch the 404 from the AJAX request no problem, the question is what to do with it then. In a multi-page app I would return a 404 status code in the response header. Obviously I can't do this in a SPA. For a user I can simply render some nice 404 text, but how do I alert a crawler that the page is not there.Codification
I guess the question could then be widened to any AJAX call? In other words, how to indicate to a crawler that an AJAX call (that returns content to be rendered) is returning a 404?Astra
I don't think you can do this. I tried to think creatively. One possible option is if on the server you kept track of removed urls (possibly in a db) you could return a 404 if the request url matched a removed path. If it wasn't removed then you would return you index page. Obviously this would only work for external links to your site.Hyperplasia
I
4

The real question is does http://example.com/pet/123456 return anything at all?

If your starting point is http://example.com/ and there's a link to http://example.com/pet/123456 then Angular will call the petController which in turn makes an AJAX call to http://example.com/api/pet/123456.

A crawler wouldn't do that but instead would try to call http://example.com/pet/123456 directly.

So your server must be able to handle that call. If there is no pet with the id 123456 then it should return 404. Problem solved. If there is then it should return the SPA. The application should then handle the situation accordingly.

Introduce answered 9/1, 2015 at 15:15 Comment(5)
That works fine when hitting the URL directly, but when hitting it via pushstate, only the API call will be made. The SPA is already loaded.Codification
Ah, no, you're right. The crawler will hit the URL directly. It's then up to the server to return the SPA or a 404 header.Codification
Finally a proper and correct response! I would only add that, if e.g. you have an architecture where front end is served by a different server than the API, it is not that trivial to make out when to return 404. Obviously, Angular cannot do it, since it works in a browser. Thus, you have 3 options: 1. make sure to not have 404s at all 2. ignore them and redirect user to e.g. front page 3. use some way to prerender your pages (e.g. prerender.io has a proper mechanism for 404s: prerender.io/documentation/best-practices). Worth noticing is that only the 3rd option is SEO-friendly.Cynewulf
The modern crawler will execute JavaScriptLindell
@Lindell It's not a matter of being modern. And not even all indexers of the most popular search engines execute JavaScript. Not to mention the gazillions of other crawlers. It's a bit like saying modern users use hardware A or browser B. That's not how it works. Executing scripts also doesn't solve the problem I was addressing: The server has to return an appropriate response for deep links, no matter what.Introduce
B
3

According to this answer How do search engines deal with AngularJS applications?, You should use Headless Browser to process crawlers requests, and serve back snapshots of the page with the appropriate Response Code. https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot

The google example did not include 301,302 or 404 cases. However, their code could be modified to analyze the content of the snapshot and change the response code.

I found prerender.io offers this service, but it is not free. However, they have a free plan if you have fewer than 250 pages. Prerender asks that in case of 404 or 301, you add a meta tag to the DOM.

<meta name="prerender-status-code" content="404">

this meta tag is then detected by their headless browser and the response code is changed.

Brisket answered 7/7, 2015 at 21:56 Comment(3)
This info has been officially deprecated by Google. They're crawler can crawl pages without having to use a headless browser solution.Imprisonment
@Failpunk thank you! that is certainly a good news for SinglePage Web Sites out there:)Brisket
@Failpunk: Removing the pre-render support for your site would be an error since there are more bots apart from Google itself.Furthest
C
0

Try this

angular.module('pets', [])

  .config(function($routeProvider, $locationProvider) {
    $locationProvider.html5Mode(true);
    $routeProvider.when('/pet/:petId', {
      controller: 'petController'
    }). otherwise({ yourUrl:'/404.html'}) // Render 404 view;
  })
Cardiovascular answered 8/1, 2015 at 13:57 Comment(2)
I really don't think that's going to work. The router will match the first pattern regardless of whether the later ajax call succeeds. The ajax callback will be executed long after the router has completed.Codification
The only conceivable mechanism here would be to do a synchronous request in the router.Codification

© 2022 - 2024 — McMap. All rights reserved.