Find all domains under a TLD
Asked Answered
R

2

33

I'm trying to find a way to list all registered domains under a top-level domain (TLD). I.e. everything under .com, .net, etc. All the tools I find only applies to finding subdomains under a domain.

Rapids answered 27/8, 2015 at 19:20 Comment(1)
If you're looking for a comprehensive tool to search for domains under any TLD, check out this domain search tool register.domains/en/dm/search-domains It offers extensive functionalities and an easy-to-use interface.Sunlight
A
25

The information you seek isn't openly available. However, there are a few options you can try:

You might want to try inquiring at the respective registries directly about getting access to the Zone files. However, the process can take weeks and some registries choose not to offer access at all. For newer GTLDs you can apply at ICANN's Centralized Zone Data Service. You might need to provide a good reason to access the full lists. The Zone file can only be pulled once a day, though, so for more up to date information the only option is a paid service.

Whois API offers the entire whois database download in major GTLDs (.com, .net, .org, .us, .biz, .mobi, etc). It also provides archived historic whois database in both parsed and raw format for download as CSV files, as well as a daily download of newly registered domains.

A similar, popular question exists already but the answers and links are a bit outdated.

Astrodome answered 13/11, 2015 at 1:5 Comment(1)
Seems like this information might be easier to come by in 2019 now that we have the certificate transparency network.Nambypamby
M
0

Probably the best approximation you can get and a very good one at that is this:

$ wget https://data.commoncrawl.org/crawl-data/CC-MAIN-2023-40/cc-index.paths.gz
$ gunzip -c cc-index.paths.gz | while IFS= read -r line; do
    wget -nc "https://data.commoncrawl.org/$line"
done
$ zgrep -oh -E '(?[a-zA-Z0-9\-\.]|\%2D)+\.hr' *.gz | tee domains.txt

This pulls Common Crawl data and scans it for anything looking like a TLD .hr domain name. The results are surprisingly good in terms of getting anything that is publicly visible via HTTP or HTTPS.

You can look for new datasets here: Common Crawl datasets

Molybdous answered 26/11, 2023 at 14:35 Comment(3)
Great tip! Can you clarify how much storage (RAM and HDD) one needs to run this? The cc-index.paths.gz file alone has a size of ~230 GB and I'm not sure if gunzip would read from the compressed file or needs some TB to first uncompress it.Moonlit
@JörgRech gunzip will do it on the fly through the pipe with no additional storage required for the decompression so the only storage you need is those initial 260-ish GBs for the cdx files. cc-index.paths.gz is under a kilobyte. I additionally filtered all the results by trying to resolve each line against DNS.Lavonia
Actually, if you're tight on space you could rework that loop to process each cdx gzip as they're coming in instead of downloading them all. That way you could get away with under 2GB of disk storage.Lavonia

© 2022 - 2024 — McMap. All rights reserved.