There are a lot of things to consider from a search standpoint when you start localizing your website into multiple languages. Generally, you want to watch out and make sure that you're not being too smart with the user's intentions. Things like auto-detecting language and storing them in cookies can be good in some scenarios, but if they become a requirement for your localizations to work correctly than you can run into some issues with search engines (and real people too).
For search engines, you'll want to make sure that they can find and access all of your content in all the different languages without POST requests (no drop down forms), javascript, flash or cookies. Because search engines generally don't use these technologies.
It turns out that this is often good for real customers as well. If you rely on browser settings or ip detection, than some of your real customers who are either borrowing a friends computer, or traveling in a foreign country might get stuck in the wrong language (Microsoft Bing actually had this problem for a while).
Here's some best practices to keep in mind
Each language should be contained under some root in your information architecture. Best option would be to acquire the TLD (mysite.fr) for each specific region for your website. Although this sometimes isn't feasible, so a second option is to use a sub-domain (fr.mysite.com), and the third option is to use a sub folder (mysite.com/fr). That makes it easiest for us to look at a set of pages in aggregate and best determine a language/ region. Don't make it a parameter (mysite.com/products/iphone/lang=en®ion=us), that is the most difficult case for us to detect.
We have language classifiers (artificial intelligence nets) that try to determine what language/ region a page is describing. So make sure you have enough clues on your page as to what the language is. E.g. if the page is french, make sure the meta description tag is also in french, as are the <h1>
tags, the title and make sure you have a solid couple sentences in french. Many sites will mix languages and have very little actual french on the page
Telephone numbers, mailing addresses and the name of the geographic location are also great clues for search engines in identifying region/ language of a page. Use these well (and make sure they are actual text on the page, not images)
Use Google Webmaster Tools to specify the language and region of your pages. Create an account, verify your site, and then you can specify which region and language different parts of your website are targeted for.
Mis-information
- the lang attribute, or any language tags you may have heard about are currently not used by any search engine. When we (Microsoft Bing) did an analysis of these last year, the most common 'standard' lang tag people were using only showed up on 0.000125% of pages on the web - not enough to be useful!
Vanessa Fox (she build google's webmaster center, and created the sitemap protocol) wrote a particularly good article recently about how Google thinks about localization, and what that means for site architecture. I recommend checking it out here: http://www.ninebyblue.com/blog/making-geotargeted-content-findable-for-the-right-searchers/