I have a php
page that renders a book of let's say 100 pages. Each page has a specific url (e.g. /my-book/page-one
, /my-book/page-two
etc).
When flipping the pages, I change the url using the history API, using url.js
.
Since all the book content is rendered from the server side, the problem is that the content is indexed by search engines (especially I'm referring to Google), but the urls are wrong (e.g. it finds a snippet on page-two
but the url is page-one
).
How to stop search engines (at least Google) to index all the content on the page, but index only the visible book page?
Would it work if I render the content in a different way: for example, <div data-page-number="1" data-content="Lorem ipsum..."></div>
and then on the JavaScript side to change that in the needed format? That would make the page slower and in fact I'm not sure if Google will not index the changed content by JavaScript.
The code looks like this:
<div data-page="1">Page 1</div>
<div data-page="2">Page 2</div>
<div data-page="3" class="current-page">Page 3</div>
<div data-page="4">Page 4</div>
<div data-page="5">Page 5</div>
Then only visible div is the .current-page
one. The same content is served on multiple urls because that's needed so the user can flip between pages.
For example, /book/page/3
will render this piece of HTML while /book/page/4
renders the same thing, the only difference being the current-page
class which is added to the 4th element.
Google did index different urls, but it did it wrong: for example, the snippet Page 5
links to /book/page/2
which renders to the user Page 2
(not Page 5
).
How to tell Google (and other search engines) I'm only interested to index the content in the .current-page
?
robots.txt
to tell Google. AFAIK Google respects it. Most probably it would be better to build asitemap.xml
and tell Google what to index and what not. You can also use Google's Webmaster Tools to push the changes and see how Google is crawling your site. – HollisHello World
on page42
(under the url/my-book/page/42
). It's very possible that Google indexes this content on another url (and obviously another page), for example,/my-book/page/7
. That happens because I serve the same content on multiple urls. I have no idea how this can be fixed... – Wolfson