How to pass an entire column as a parameter to tldextract function?
Asked Answered
B

2

5

tldextract is used to extract domain names from the URLs. Here, 'url' is one of the column name in the data frame 'df'. It is possible to pass one value of 'url' as a parameter. However, I am not able to pass the entire column as a parameter. The url being passed here is 'https://www.google.com/search?source=hp&ei=7iE'

listed = tldextract.extract(df['url'][0])
dom_name = listed.domain
print(dom_name)

Output: google

What I want is to create a new column in the data frame named 'Domain' having the extracted domain names from the URL.

Something like:

df['Domain'] = tldextract.extract(df['url'])

But this isn't working

Here is the code:

# IMPORTING PANDAS
import pandas as pd
from IPython.display import display

import tldextract

# Read data sample
df = pd.read_csv("bookcsv.csv")

df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)

Here is the input data:

The dataframe looks like this I am not able to put the data directly here. So, I am posting a snapshot.

Bloke answered 15/7, 2018 at 11:11 Comment(0)
A
6

Using apply with apply the function to every element in the column and will keep everything neatly lined up.

df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)

Here's the full code I used for testing:

import pandas as pd, tldextract

df = pd.DataFrame([{'url':'https://google.com'}]*12)
df['Domain'] = df['url'].apply(lambda url: tldextract.extract(url).domain)
print(df)

Output:

                   url  Domain
0   https://google.com  google
1   https://google.com  google
2   https://google.com  google
3   https://google.com  google
4   https://google.com  google
5   https://google.com  google
6   https://google.com  google
7   https://google.com  google
8   https://google.com  google
9   https://google.com  google
10  https://google.com  google
11  https://google.com  google
Argufy answered 15/7, 2018 at 11:13 Comment(2)
It shows the same error: TypeError: expected string or bytes-like objectBloke
@Bloke I ran a test on the code and it seems to work fine. Can I see your full code and input data?Argufy
M
0

@Neil is close. But, you really don't need the lamdba function.

import pandas as pd, tldextract

df = pd.DataFrame([{'url':'https://google.com'}]*12)
df['Domain'] = df['url'].apply(tldextract.extract)
print(df)
Murguia answered 22/8, 2024 at 21:47 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.