tldextract is used to extract domain names from the URLs. Here, 'url' is one of the column name in the data frame 'df'. It is possible to pass one value of 'url' as a parameter. However, I am not able to pass the entire column as a parameter. The url being passed here is 'https://www.google.com/search?source=hp&ei=7iE'
listed = tldextract.extract(df['url'][0])
dom_name = listed.domain
print(dom_name)
Output: google
What I want is to create a new column in the data frame named 'Domain' having the extracted domain names from the URL.
Something like:
df['Domain'] = tldextract.extract(df['url'])
But this isn't working
Here is the code:
# IMPORTING PANDAS
import pandas as pd
from IPython.display import display
import tldextract
# Read data sample
df = pd.read_csv("bookcsv.csv")
df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)
Here is the input data:
The dataframe looks like this I am not able to put the data directly here. So, I am posting a snapshot.