Internationalization and Search Engine Optimization
Asked Answered
S

5

19

I'd like to internationalize my site such that it's accessible in many languages. The language setting will be detected in the request data automatically, and can be overridden in the user's settings / stored in the session.

My question pertains to how I should display the various versions of the same page based upon language in terms of the pages' URL's. Let's say we're just looking at the index page of http://www.example.com/, which defaults to English. Now if a French-speaker loads the index page, should I simply keep the URL as http://www.example.com/, or should I have it redirect to http://www.example.com/fr/?

I'm trying to figure out what benefits or consequences this has in terms of SEO. I don't want the French version of the site showing up in google.com if it prevents the English version of the same pages from showing up there, but I would like it to show up in google.fr.

Socage answered 1/12, 2009 at 19:28 Comment(0)
D
43

There are a lot of things to consider from a search standpoint when you start localizing your website into multiple languages. Generally, you want to watch out and make sure that you're not being too smart with the user's intentions. Things like auto-detecting language and storing them in cookies can be good in some scenarios, but if they become a requirement for your localizations to work correctly than you can run into some issues with search engines (and real people too).

For search engines, you'll want to make sure that they can find and access all of your content in all the different languages without POST requests (no drop down forms), javascript, flash or cookies. Because search engines generally don't use these technologies.

It turns out that this is often good for real customers as well. If you rely on browser settings or ip detection, than some of your real customers who are either borrowing a friends computer, or traveling in a foreign country might get stuck in the wrong language (Microsoft Bing actually had this problem for a while).

Here's some best practices to keep in mind

  • Each language should be contained under some root in your information architecture. Best option would be to acquire the TLD (mysite.fr) for each specific region for your website. Although this sometimes isn't feasible, so a second option is to use a sub-domain (fr.mysite.com), and the third option is to use a sub folder (mysite.com/fr). That makes it easiest for us to look at a set of pages in aggregate and best determine a language/ region. Don't make it a parameter (mysite.com/products/iphone/lang=en&region=us), that is the most difficult case for us to detect.

  • We have language classifiers (artificial intelligence nets) that try to determine what language/ region a page is describing. So make sure you have enough clues on your page as to what the language is. E.g. if the page is french, make sure the meta description tag is also in french, as are the <h1> tags, the title and make sure you have a solid couple sentences in french. Many sites will mix languages and have very little actual french on the page

  • Telephone numbers, mailing addresses and the name of the geographic location are also great clues for search engines in identifying region/ language of a page. Use these well (and make sure they are actual text on the page, not images)

  • Use Google Webmaster Tools to specify the language and region of your pages. Create an account, verify your site, and then you can specify which region and language different parts of your website are targeted for.

Mis-information - the lang attribute, or any language tags you may have heard about are currently not used by any search engine. When we (Microsoft Bing) did an analysis of these last year, the most common 'standard' lang tag people were using only showed up on 0.000125% of pages on the web - not enough to be useful!

Vanessa Fox (she build google's webmaster center, and created the sitemap protocol) wrote a particularly good article recently about how Google thinks about localization, and what that means for site architecture. I recommend checking it out here: http://www.ninebyblue.com/blog/making-geotargeted-content-findable-for-the-right-searchers/

Divisive answered 1/12, 2009 at 23:39 Comment(1)
Very elaborate response, I can't tell you how much I appreciate all this! I've been using Google Webmaster Tools, but I never noticed the language settings in there, as I've never needed it. I'll search for it now. I'm also using the CakePHP framework, which has allowed me to prepare all my strings for I18n; they just need the .PO files created, along with the code required to handle the language code sub-domain/sub-folder/whatever I decide to use.Socage
M
3

This is how I solved the problem on my personal website as an exercise in i18n:

  • When a user arrives at, e.g. brazzy.de/index.php, the site tries to determine the language from cookie (if present) or browser settings (Accept-language header), defaults to English, and does not redirect
  • Every page has links to the different language versions of that page (IMO the most important factor for user convenience, and also makes sure search engines can easily find the different versions).
  • These links lead to e.g. brazzy.de/en/index.php, which is in my case rewritten to brazzy.de/index.php?lang=en - this ensures that search engines see distinct URLs for the different language versions.
  • Visiting such a subdirectory sets the language cookie to that language
  • The pages without a language-specific URL (i.e. where the language depends on client data) use e.g. <link rel="canonical" href="/en/"> to tell the search engine at which language-specific URL that page can be found.
  • Use XML sitemaps to further make sure search engines can find all pages and all different language versions.
Means answered 4/12, 2009 at 12:44 Comment(2)
All of this seems like great advice, but I'm stuck on one bullet. Can you elaborate a little more this one? => "The pages without a language-specific URL (i.e. where the language depends on client data) use e.g. <link rel="canonical" href="/en/"> to tell the search engine at which language-specific URL that page can be found."Tied
@naomik: let's say a search engine visits /gallery.php and the page defaults to English. So it should contain <link rel="canonical" href="/en/gallery.php"> in the header so that the Search engine knows that that's the canonical URL of the content it's seeing. This will prevent problems if the search engine ever sees a different language version under the same URL.Means
C
1

Since the pages will have different content, I would provide a different URL for the different languages.

The major search engines are smart enough to figure out the language of a page given its contents. Having a language code in the URL (like fr) should also provide the search engines with a hint.

Comforter answered 1/12, 2009 at 19:32 Comment(1)
Plus, you should set the lang attribute on the <html> tag. But you knew that :). tlt.its.psu.edu/suggestions/international/web/tips/langtag.htmlMcpherson
A
0

Two answers: how I do it, and how I gather SEOisers think you're supposed to do it.

My site has mostly english and a couple of German pages, and I plan to have more German pages, and possibly some spanish pages. I have the root page be language-agnostic, and have navigation links with German pages (where they exist) beneath their English equivalents, and use urls that are different in English and German (e.g., /services.html vs. /leistungen.html).

This is good user UI and supposedly lousy SEO, since all the different languages are all tangled up without a way for search engines to disentagle them, which may have bad consequences when calculating search quality metrics.

The SEO-right thing is to maintain a distinct hierarchy, possibly of the form www.site.tld/lang/, but better lang.site.tld/, each with a separate sitemap.xml file.

I care more about visitors than search engines, so I will continue to do the Wrong Thing.

Attar answered 4/12, 2009 at 12:14 Comment(0)
A
-2

Here is my idea about handling that case.

  1. You must i18n your page usinging php
  2. You create for each language of your website a folder like root/lang/en (for english)
  3. You create a .htaccess file(it must reside in /lang/ folder) in which you write some redirection(rewritting url) Here Is what to put inside that .htaccess file:

 <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteRule lang/(+)$ index.php?lang=$1 [QSA,NC]
  </IfModule>
  1. : First checking if that rewrite module is present and active on apache because if not it will pop an error

  2. RewriteEngine On :We prepare rewrite engine to start processing rules

  3. lang/(+)$ means any request to a subfolder of that folder or file should be redirected to index page and passing the name of the subfolder of file as a lang parameter of your index.php file in which is localized according to lang value
  4. [QSA] means that the named capture(the language) will be appended to the newly created URI,

  5. [NC] means that our URIs are not case sensitive

Accentual answered 12/9, 2015 at 10:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.