How can we use Selenium Webdriver in colab.research.google.com?
Asked Answered
E

13

71

I want to use Selenium Webdriver of Chrome in colab.research.google.com for fast processing. I was able to install Selenium using !pip install selenium but the webdriver of chrome needs a path to webdriverChrome.exe. How am I suppose to use it?

P.S.- colab.research.google.com is an online platform which provides GPU for fast computational problems related to deep learning. Please refrain from solutions such as webdriver.Chrome(path).

Enlightenment answered 26/6, 2018 at 15:23 Comment(4)
I think I mentioned "colab.research.google.com". I know how webdriver works on a local machine. But as colab research google is an online platform which provides GPU for fast machine learning processing problems, I want to use webdrive on this above mentioned online platform.Enlightenment
A same problem is in this link: #54328154Silicic
seems like it was asked 7 days agoEnlightenment
@Dimanjan Hey, I have stopped trying this. Use-case was scrapped and so did not explored further.Enlightenment
K
125

Recently Google collab was upgraded and since Ubuntu 20.04+ no longer distributes chromium-browser outside of a snap package, you can install a compatible version from the Debian buster repository:

%%shell
# Ubuntu no longer distributes chromium-browser outside of snap
#
# Proposed solution: https://askubuntu.com/questions/1204571/how-to-install-chromium-without-snap

# Add debian buster
cat > /etc/apt/sources.list.d/debian.list <<'EOF'
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
EOF

# Add keys
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A

apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg

# Prefer debian repo for chromium* packages only
# Note the double-blank lines between entries
cat > /etc/apt/preferences.d/chromium.pref << 'EOF'
Package: *
Pin: release a=eoan
Pin-Priority: 500


Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300


Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
EOF

# Install chromium and chromium-driver
apt-get update
apt-get install chromium chromium-driver

# Install selenium
pip install selenium

Then you can run selenium like this:

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.headless = True
wd = webdriver.Chrome('chromedriver',options=chrome_options)
wd.get("https://www.webite-url.com")
Kilo answered 7/1, 2019 at 16:7 Comment(4)
Are you have such suggestion for gecko driver?Silicic
Hi, could you please the question in this link: #54328154Silicic
cp: cannot stat '/usr/lib/chromium-browser/chromedriver': No such file or directoryConcise
i am facing the same issue on a colab-page .- cf : colab.research.google.com/drive/…Bilinear
N
38

this one worked in colab

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
Nakano answered 11/4, 2020 at 13:59 Comment(1)
dear Shaina Raza - well i am facing t he very same issue on a colab page: cf my colab page: colab.research.google.com/drive/…Bilinear
C
24

I made my own library to make it easy.

!pip install kora -q
from kora.selenium import wd
wd.get("https://www.website.com")

PS: I forget how I searched and experimented until it worked. But I first wrote and shared it in this gist in Dec 2018.

Cordovan answered 16/7, 2020 at 7:12 Comment(10)
Doesn't work on google colab: gives the same error: WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -6Flicker
@AhmadM. It should be working now. URL was not correct at first.Cordovan
@Cordovan is there a support for the tor browser in your library?Enlightenment
@johnmich No. Only chrome-selenium is supported.Cordovan
@Cordovan WebDriverWait is supported there?Glossography
It worked nicely a few months ago, but now it doesn't with selenium 4.1.0 : wd.get("https://www.twitter.com") --> JavaScript is not available.. Is there a way to enable javascript in the webdriver ?Immoralist
I am also getting an error trying to run this in databricks: "WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see chromedriver.chromium.org/home"Pasadena
Is this abandoned?Phosphate
hi there @αԋɱҽԃ αмєяιcαη - well this thing drives me almost nuts: i have tried alot to get the Selenium Webdriver in colab.research.google.com up and running. But at the moment i am stuck - the script does not start and produce no output on my colab-site: cf : colab.research.google.com/drive/… See the thread: #77859973 Dear αԋɱҽԃ αмєяιcαη , you have helped me quite alot so far and i am a long-term fand and follower of you! ;)Bilinear
I get this error when reproducing the above in google colab: WebDriver.__init__() got multiple values for argument 'options'Remark
H
13

Don't have enough repu to comment. :(

However @Thomas answer still works in 06.10.2021, but with just one simple change since right of the bat you'll get DeprecationWarning: use options instead of chrome_options

Working code below:

!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://mcmap.net/q/67782/-how-can-we-use-selenium-webdriver-in-colab-research-google-com")
wd.title
Hemichordate answered 6/10, 2021 at 11:38 Comment(2)
If you don't have the reputation to comment, you should instead edit q/a and write answers until you have the reputation.Jointure
I mean write actual answers to other questions, that aren't practically comments.Jointure
S
6

You can use Google-Colab-Selenium

It will download Google Chrome, set everything up, and add the required WebDriver Options to the Selenium instance:

enter image description here

Here's a Google Colab Notebook to test it out for yourself: https://colab.research.google.com/drive/1MUFonUP4nlgtYoPIglnr0HsUsqljz64A

Streamer answered 12/11, 2023 at 17:30 Comment(0)
B
4

Google collab is using Ubuntu 20.04 now and you can't install chromium-browser without snap. But you can install it by using .deb files for ubuntu 18.04 at security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/.

I made a python script for this purpose. It finds latest version of chromium-browser and chromedriver for 18.04 and installs it for your google colab which has Ubuntu 20.04.

Site's links have been updated regularly. You don't need debian repository and apt keys.

import os
import re
import subprocess
import requests

# The deb files we need to install
deb_files_startstwith = [
    "chromium-codecs-ffmpeg-extra_",
    "chromium-codecs-ffmpeg_",
    "chromium-browser_",
    "chromium-chromedriver_"
]

def get_latest_version() -> str:
    # A request to security.ubuntu.com for getting latest version of chromium-browser
    # e.g. "112.0.5615.49-0ubuntu0.18.04.1_amd64.deb"
    url = "http://security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/"
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception("status_code code not 200!")
    text = r.text

    # Find latest version
    pattern = '<a\shref="chromium\-browser_([^"]+.ubuntu0\.18\.04\.1_amd64\.deb)'
    latest_version_search = re.search(pattern, text)
    if latest_version_search:
        latest_version = latest_version_search.group(1)
    else:
        raise Exception("Can not find latest version!")
    return latest_version

def download(latest_version: str, quiet: bool):
    deb_files = []
    for deb_file in deb_files_startstwith:
        deb_files.append(deb_file + latest_version)

    for deb_file in deb_files:
        url = f"http://security.ubuntu.com/ubuntu/pool/universe/c/chromium-browser/{deb_file}"

        # Download deb file
        if quiet:
            command = f"wget -q -O /content/{deb_file} {url}"
        else:
            command = f"wget -O /content/{deb_file} {url}"
        print(f"Downloading: {deb_file}")
        # os.system(command)
        !$command

        # Install deb file
        if quiet:
            command = f"apt-get install /content/{deb_file} >> apt.log"
        else:
            command = f"apt-get install /content/{deb_file}"
        print(f"Installing: {deb_file}\n")
        # os.system(command)
        !$command

        # Delete deb file from disk
        os.remove(f"/content/{deb_file}")

def check_chromium_installation():
    try:
        subprocess.call(["chromium-browser"])
        print("Chromium installation successfull.")
    except FileNotFoundError:
        print("Chromium Installation Failed!")

def install_selenium_package(quiet: bool):
    if quiet:
        !pip install selenium -qq >> pip.log
    else:
        !pip install selenium

def main(quiet: bool):
    # Get the latest version of chromium-browser for ubuntu 18.04
    latest_version = get_latest_version()
    # Download and install chromium-browser for ubuntu 20.04
    download(latest_version, quiet)
    # Check if installation succesfull
    check_chromium_installation()
    # Finally install selenium package
    install_selenium_package(quiet)

if __name__ == '__main__':
    quiet = True # verboseness of wget and apt
    main(quiet)

And try selenium

from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
wd = webdriver.Chrome('chromedriver', options=chrome_options)
wd.get("https://www.google.com")
print(f"Page title: {wd.title}")

Update 14.02.2024

Ubuntu 18.04 LTS has reached end of life(no more updates) and google colab updated to Ubuntu 22.04 LTS.

While searching for an alternative I found that linux mint has it's own chromium repo instead of snap. Linux Mint 21.3(Virginia) is based on Ubuntu 22.04 LTS and will get updates until 2027. Since both of them ubuntu 22.04 LTS we can use linux mint's chromium deb file to install chromium on google colab. I tested and it is working.

I changed script to scrape chromium browser from Linux Mint packages site and also chromedriver from Chrome for Testing page.

import os
import shutil
import re
import subprocess
import urllib
import zipfile
import requests


"""
Scrapes and installs chromium from linux mint 21.3(virginia) packages site.
Link: http://packages.linuxmint.com/pool/upstream/c/chromium/
Scrapes and installs chromedriver from Chrome for Testing page.
Link: https://googlechromelabs.github.io/chrome-for-testing/
"""

class CantGetLatestChromiumVersionError(Exception):
    """Happens when regex failed"""

class ChromiumInstallationFailedException(Exception):
    """
    Happens when deb package not installed
    Check the downloaded chroumium deb file
    """

class CantGetChromeDriverError(Exception):
    """Happens when regex failed"""

main_url = "http://packages.linuxmint.com/pool/upstream/c/chromium/"
work_dir = "/content"

def get_chromium_latest_version() -> str:
    # A request to packages.linuxmint.com for getting latest version of chromium
    # e.g. "chromium_121.0.6167.160~linuxmint1+virginia_amd64.deb"
    r = requests.get(main_url)
    if r.status_code != 200:
        raise Exception("status_code code not 200!")
    text = r.text

    # Find latest version
    pattern = '<a\shref="(chromium_[^"]+linuxmint1%2Bvirginia_amd64.deb)'
    latest_version_search = re.search(pattern, text)
    if latest_version_search:
        latest_version = latest_version_search.group(1)
    else:
        raise CantGetLatestChromiumVersionError("Failed to get latest chromium version!")
    return latest_version

def install_chromium(latest_version: str, deb_file: str, quiet: bool):
    # Full url of deb file
    url = f"{main_url}{latest_version}"

    # Download deb file
    if quiet:
        command = f"wget -q -O {work_dir}/{deb_file} {url}"
    else:
        command = f"wget -O {work_dir}/{deb_file} {url}"
    print(f"Downloading: {deb_file}")
    # os.system(command)
    !$command

    # Install deb file
    if quiet:
        command = f"apt-get install {work_dir}/{deb_file} >> apt.log"
    else:
        command = f"apt-get install {work_dir}/{deb_file}"
    print(f"Installing: {deb_file}")
    # os.system(command)
    !$command

def check_chromium_installation(deb_file: str):
    try:
        subprocess.call(["chromium"])
        print("Chromium installation successfull.\n")
        # If installation successfull we can remove deb file
        # Delete deb file from disk
        os.remove(f"{work_dir}/{deb_file}")
    except FileNotFoundError:
        raise ChromiumInstallationFailedException("Chromium Installation Failed!")

def get_chromedriver_url(deb_file: str) -> str:
    # Get content of crhomedriver page
    url = "https://googlechromelabs.github.io/chrome-for-testing/"
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception("status_code code not 200!")
    text = r.text

    # Get chromium version from deb file's name
    version_number = deb_file.split("chromium_")[-1].split(".")[0]

    # Example: https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/121.0.6167.85/linux64/chromedriver-linux64.zip
    pattern = f'https://[^<]+/{version_number}[^<]+/linux64/chromedriver-linux64.zip'
    # Find latest version
    chromedriver_url_search = re.search(pattern, text)
    if chromedriver_url_search:
        chromedriver_url = chromedriver_url_search.group()
        return chromedriver_url
    else:
        raise CantGetChromeDriverError("Failed to get chromedriver!")

def install_chromedriver(deb_file: str, quiet: bool):
    url = get_chromedriver_url(deb_file)
    file_name = url.split("/")[-1]
    # Download chromedriver
    chromedriver_zip = f"{work_dir}/{file_name}"
    if quiet:
        command = f"wget -q -O {chromedriver_zip} {url}"
    else:
        command = f"wget -O {chromedriver_zip} {url}"
    print(f"Downloading: {file_name}")
    # os.system(command)
    !$command

    # Extract chromedriver from zip
    with zipfile.ZipFile(chromedriver_zip) as zpf:
        zpf.extract(member="chromedriver-linux64/chromedriver", path=work_dir)

    # Remove chromedriver-linux64.zip file
    os.remove(chromedriver_zip)

    # Move extracted chromedriver binary file to /usr/bin directory
    source = f"{work_dir}/chromedriver-linux64/chromedriver"
    destination = "/usr/bin/chromedriver"
    os.rename(source, destination)

    # Make chromedriver binary executable
    os.system(f"chmod +x {destination}")

    # Remove empty chromedriver-linux64 folder
    shutil.rmtree(f"{work_dir}/chromedriver-linux64")

    print("Chromedriver installed")

def install_selenium_package(quiet: bool):
    if quiet:
        !pip install selenium -qq >> pip.log
    else:
        !pip install selenium

def main(quiet: bool):
    # Get the latest version of chromium from linux mint packages site
    latest_version = get_chromium_latest_version()
    # Name of the deb file
    deb_file = urllib.parse.unquote(latest_version, "utf-8")
    # Download and install chromium for ubuntu 22.04
    install_chromium(latest_version, deb_file, quiet)
    # Check if installation succesfull
    check_chromium_installation(deb_file)
    # Install chromedriver
    install_chromedriver(deb_file, quiet)
    # Finally install selenium package
    install_selenium_package(quiet)

if __name__ == '__main__':
    quiet = True # verboseness of wget and apt
    main(quiet)

To test it:

from selenium import webdriver

# Chromium browser options
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

# Start webdriver
driver = webdriver.Chrome(options=options)

# Go to https://www.google.com and print page title
driver.get("https://www.google.com")
print(f"Page title: {driver.title}")
driver.quit()
Boyle answered 27/4, 2023 at 13:34 Comment(1)
IMPORTANT: As of Selenium 4.10, you need to use "Service": from selenium import webdriver , from selenium.webdriver.chrome.service import Service , service = Service(executable_path=r'/path/to/chromedriver') , options as in the answer above , wd = webdriver.Chrome(service=service, options=chrome_options)Bedcover
O
3

to use selenium in GOOGLE COLAB do the next steps in the colab notebook

!pip install kora -q

HOW TO USE IT INSIDE COLAB :

from kora.selenium import wd
wd.get("enter any website here")

YOU CAN ALSO USE IT WITH Beautiful Soup

import bs4 as soup
wd.get("enter any website here")
html = soup.BeautifulSoup(wd.page_source)
Oshaughnessy answered 2/12, 2021 at 12:26 Comment(2)
You should credit the person who wrote the kora library.Phosphate
good day d ear Mohamed TOUATI - many thanks for the hint - i am eagerly wanting to know how to fix the issues on colab - well i am facing t he very same issue on a colab page: cf my colab page: colab.research.google.com/drive/…Bilinear
S
0

colab and selenium How can data be extracted from a whoscored.com?

#    https://www.whoscored.com

# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome(options=options)
wd.get("https://www.whoscored.com")
print(wd.page_source)  # results
Sapphism answered 16/10, 2022 at 17:3 Comment(0)
G
0

Install Library

!pip install selenium
!apt-get update
!apt install chromium-chromedriver

And set up a chrome driver

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

# Set the path to the chromedriver executable
chromedriver_path = '/usr/bin/chromedriver'

# Set the Chrome driver options
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

# Start the Chrome driver
driver = webdriver.Chrome(service=Service(executable_path=chromedriver_path), options=options)

# Navigate to a website
driver.get('https://www.example.com')

# Quit the driver
driver.quit()
Gentility answered 4/3, 2023 at 8:53 Comment(0)
M
0

if you are taking any error like "WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 1"

in notebook page Ctrl + Shift + P , choose "Use fallback runtime version" after try again.

Morten answered 24/3, 2023 at 6:55 Comment(0)
S
0

Updated answer

# Set up for running selenium in Google Colab
## You don't need to run this code if you do it in Jupyter notebook, or other local Python setting
%%shell
sudo apt -y update
sudo apt install -y wget curl unzip
wget http://archive.ubuntu.com/ubuntu/pool/main/libu/libu2f-host/libu2f-udev_1.1.4-1_all.deb
dpkg -i libu2f-udev_1.1.4-1_all.deb
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb
CHROME_DRIVER_VERSION=`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`
wget -N https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip -P /tmp/
unzip -o /tmp/chromedriver_linux64.zip -d /tmp/
chmod +x /tmp/chromedriver
mv /tmp/chromedriver /usr/local/bin/chromedriver
pip install selenium
!pip install chromedriver-autoinstaller

import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
import chromedriver_autoinstaller

# setup chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless') # ensure GUI is off
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# set path to chromedriver as per your configuration
chromedriver_autoinstaller.install()

# set the target URL
url = "put-url-here-to-scrape"

# set up the webdriver
driver = webdriver.Chrome(options=chrome_options)
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# quit the driver
driver.quit()

This is copied from solution by Github user name goljavi from this thread https://github.com/googlecolab/colabtools/issues/3347

Sebrinasebum answered 19/9, 2023 at 18:27 Comment(0)
W
0

These commands will ensure that you have the necessary packages, including Google Chrome and ChromeDriver, set up for Selenium in Google Colab. # Install dependencies %%shell sudo apt -y update sudo apt install -y wget curl unzip

# Install libu2f-udev
wget http://archive.ubuntu.com/ubuntu/pool/main/libu/libu2f-host/libu2f-udev_1.1.4-1_all.deb
dpkg -i libu2f-udev_1.1.4-1_all.deb

# Install Google Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb

# Download and install ChromeDriver
CHROME_DRIVER_VERSION=`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`
wget -N https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip -P /tmp/
unzip -o /tmp/chromedriver_linux64.zip -d /tmp/
chmod +x /tmp/chromedriver
mv /tmp/chromedriver /usr/local/bin/chromedriver

# Install Selenium
pip install selenium

from selenium import webdriver

# Set up Chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')  # Optional: Run Chrome in headless mode

# Create a WebDriver instance
driver = webdriver.Chrome(options=chrome_options)

# Now, you can use the 'driver' object to interact with the web page.

# Example: Open Google
driver.get("https://www.google.com")

# Example: Print the title of the page
print("Title:", driver.title)

# Close the WebDriver when done
driver.quit()
Wellordered answered 25/12, 2023 at 8:45 Comment(0)
P
-6

You can can rid of using .exe file by using WebDriverManager so instead of this

System.setProperty("webdriver.gecko.driver", "driverpath/.exe");
WebDriver driver = new FirefoxDriver();

you will be writing this

WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();

All you need is add the dependecy to the POM file(Im assuming you using maven or some build tool) Please see my full answer about how to use this in this link Using WebdriverManager

Plast answered 26/6, 2018 at 18:31 Comment(1)
You run python on Colab, not javascript. Please give a python answer.Cordovan

© 2022 - 2024 — McMap. All rights reserved.