How do I let search crawlers properly index pages with infinite scroll?
Asked Answered
T

6

13

I have a website on which I implement infinite scroll: when a user reaches the end of a page, an AJAX call is made and new content is attached to the bottom of the page. This, however, means that all content after the first "page break" is unattainable by search crawlers. For example, I have a page that lists all items with the "infographic" tag. There are actually several dozens of such items, but crawlers are able to see only the first 10, because other items are loaded based on the position of the content relative to browser window. Since crawlers don't have browser windows, new items are not loaded at all.

What is the proper way, then, to let search crawlers access the full content of web pages with infinite scroll, while also allowing users to enjoy the infinite scroll and the lack of pagination?

Turbinate answered 28/5, 2012 at 11:39 Comment(0)
O
4

Make a View All page

Make another page, with everything listed on it and linking to the items normally that are contained on the page with the infinite scroll. Then place a small link (maybe at the very bottom) of the infinite scroll page called all whatever. In other words if your page is listing products the link should say Show All Products or similar. If the page is blog articles then the link should be something like All Articles. Yes humans might not like the long load but for Google it doesn't matter that it's large. It will download it and follow the links in it normally.

Furthermore if your pagination is iterating through tens of thousands of items then you can break down your 'View All Page' in sections similar to how a blog archive works, or a product catalog works. The point is that you are providing an alternative means for humans without javascript and those that really want to see everything and at the same time also for Google and even other search engines to crawl your inventory of pages.

Finally as a secondary measure add a /sitemap.xml file which has an index of every article/product/inventory or whatever. See http://www.sitemaps.org/

You can watch an official Google Webmaster video titled, Pagination and SEO, about the view all concept, pagination, canonical urls and Google's rel=next and rel=prev attributes.

http://www.youtube.com/watch?v=njn8uXTWiGg

Organist answered 11/10, 2012 at 14:22 Comment(3)
I totally agree with the suggestion of Anthony above, however one word of warning. Your sitemap pages (All products, All Articles, All Whatever) should not link to more than a 100 pages at a time, otherwise the page's weight / SEO rank will go down. If you have more than a 100 links (100 articles) consider breaking All Whatever into multiple pages.Misanthropy
it's /sitemap.xml not sitemap.html as in using sitemaps.org and the limit is 50,000 links per sitemap.xml or 10MBOrganist
In my comment above i was referring to creating a sitemap html besides the sitemap xml which can have up to 50,000 links. I believe that for best results you would want to create a sitemap.xml and a directory listing of all of your articles/content pages , etc.Misanthropy
S
19

Along the lines of graceful degradation, you shouldn't rely on JavaScript for something as important as pagination. I would probably implement a normal pagination system first (that search engines can index), and then use JS to hide the pagination links and implement the infinite scroll solution.

Stav answered 9/10, 2012 at 6:17 Comment(2)
Definitely the correct way to handle this. Accepted answer will just bring the server to its knees not to mention be easy pickings for a DDoS attack.Anticipant
I agree with Victor, and so Google; this is the official Google advice for making fault-tolerant and indexable infinite scroll: googlewebmastercentral.blogspot.com.ar/2014/02/…Naman
O
4

Make a View All page

Make another page, with everything listed on it and linking to the items normally that are contained on the page with the infinite scroll. Then place a small link (maybe at the very bottom) of the infinite scroll page called all whatever. In other words if your page is listing products the link should say Show All Products or similar. If the page is blog articles then the link should be something like All Articles. Yes humans might not like the long load but for Google it doesn't matter that it's large. It will download it and follow the links in it normally.

Furthermore if your pagination is iterating through tens of thousands of items then you can break down your 'View All Page' in sections similar to how a blog archive works, or a product catalog works. The point is that you are providing an alternative means for humans without javascript and those that really want to see everything and at the same time also for Google and even other search engines to crawl your inventory of pages.

Finally as a secondary measure add a /sitemap.xml file which has an index of every article/product/inventory or whatever. See http://www.sitemaps.org/

You can watch an official Google Webmaster video titled, Pagination and SEO, about the view all concept, pagination, canonical urls and Google's rel=next and rel=prev attributes.

http://www.youtube.com/watch?v=njn8uXTWiGg

Organist answered 11/10, 2012 at 14:22 Comment(3)
I totally agree with the suggestion of Anthony above, however one word of warning. Your sitemap pages (All products, All Articles, All Whatever) should not link to more than a 100 pages at a time, otherwise the page's weight / SEO rank will go down. If you have more than a 100 links (100 articles) consider breaking All Whatever into multiple pages.Misanthropy
it's /sitemap.xml not sitemap.html as in using sitemaps.org and the limit is 50,000 links per sitemap.xml or 10MBOrganist
In my comment above i was referring to creating a sitemap html besides the sitemap xml which can have up to 50,000 links. I believe that for best results you would want to create a sitemap.xml and a directory listing of all of your articles/content pages , etc.Misanthropy
C
3

The proper way is to allow no-js-pagination. Usually what most websites do is insert a pagination button at the bottom of the feed. As the user scrolls down, the auto-pagination is triggered from the button action at the bottom of the feed, which may be hidden while that occurs. What this means is, an html element from the document triggers the auto-pagination, it is not pure javascript. If this button is, lets say, an anchor tag, which also delivers html, then the web-crawlers will have access to it. And here enters the concept of graceful degradation mentioned by @Victor Stanciu: Always provide an HTML fallback response on top of your standard JS response.

Cossack answered 11/10, 2012 at 11:19 Comment(0)
L
0

also - the more content you have on a given page - the less the crawlers will weigh each word, so you might end up getting no hits with too much content on each page.

So rather work with normal pagination and friendly urls as exizt also suggested.

Lepage answered 11/10, 2012 at 11:17 Comment(0)
C
0

While it sounds like a good idea, I can see this having a negative effect on your pagerank. The amount of information that the crawler would have to get through is going to cause the rank of the links in the content to fall, making the original point to letter the crawler in there useless.

A lot of what you want should be done with your sitemap and meta tags. As long as the crawlers can still access the content through individual page calls, you should be ok.

Coxswain answered 11/10, 2012 at 14:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.