I'm trying to find a way to list all registered domains under a top-level domain (TLD). I.e. everything under .com, .net, etc. All the tools I find only applies to finding subdomains under a domain.
The information you seek isn't openly available. However, there are a few options you can try:
You might want to try inquiring at the respective registries directly about getting access to the Zone files. However, the process can take weeks and some registries choose not to offer access at all. For newer GTLDs you can apply at ICANN's Centralized Zone Data Service. You might need to provide a good reason to access the full lists. The Zone file can only be pulled once a day, though, so for more up to date information the only option is a paid service.
Whois API offers the entire whois database download in major GTLDs (.com, .net, .org, .us, .biz, .mobi, etc). It also provides archived historic whois database in both parsed and raw format for download as CSV files, as well as a daily download of newly registered domains.
A similar, popular question exists already but the answers and links are a bit outdated.
Probably the best approximation you can get and a very good one at that is this:
$ wget https://data.commoncrawl.org/crawl-data/CC-MAIN-2023-40/cc-index.paths.gz
$ gunzip -c cc-index.paths.gz | while IFS= read -r line; do
wget -nc "https://data.commoncrawl.org/$line"
done
$ zgrep -oh -E '(?[a-zA-Z0-9\-\.]|\%2D)+\.hr' *.gz | tee domains.txt
This pulls Common Crawl data and scans it for anything looking like a TLD .hr domain name. The results are surprisingly good in terms of getting anything that is publicly visible via HTTP or HTTPS.
You can look for new datasets here: Common Crawl datasets
© 2022 - 2024 — McMap. All rights reserved.