如何使用python收集子域名

如何使用Python收集子域名的方法有：使用第三方库进行子域名收集、通过API接口获取子域名信息、编写脚本进行子域名爆破、利用网络爬虫技术进行子域名发现。这些方法可以根据具体需求和场景选择使用。下面将详细介绍如何使用第三方库进行子域名收集。

使用第三方库进行子域名收集

Python有很多优秀的第三方库可以帮助我们进行子域名的收集工作，其中比较常用的有sublist3r、dnsrecon和amass等。这些工具功能强大且使用便捷，能够快速地收集大量子域名信息。下面以sublist3r为例，介绍如何使用Python进行子域名的收集。

1、安装Sublist3r

Sublist3r是一个开源的Python工具，专门用于收集网站的子域名。首先需要安装Sublist3r，可以通过pip安装：

pip install sublist3r

2、使用Sublist3r收集子域名

安装完成后，可以通过编写Python脚本来使用Sublist3r收集子域名。下面是一个简单的示例脚本：

import sublist3r
def collect_subdomains(domain):
    subdomains = sublist3r.main(domain, 40, output=None, ports=None, silent=True, verbose=False, enable_bruteforce=False, engines=None)
    return subdomains
if __name__ == '__main__':
    domain = 'example.com'
    subdomains = collect_subdomains(domain)
    for subdomain in subdomains:
        print(subdomain)

在这个脚本中，我们通过调用sublist3r.main函数来收集子域名，参数包括目标域名、线程数、是否启用爆破等。收集到的子域名将存储在列表中，并逐一打印。

通过API接口获取子域名信息

除了使用第三方库，我们还可以通过一些公开的API接口来获取子域名信息。这些API接口通常提供丰富的子域名数据，且使用简单方便。常用的子域名API接口有VirusTotal、Censys、Shodan等。

1、使用VirusTotal API

VirusTotal是一个综合的威胁情报平台，提供了丰富的API接口，可以用来获取子域名信息。使用VirusTotal API需要先申请一个API Key。下面是一个使用VirusTotal API获取子域名的示例脚本：

import requests
def get_subdomains(domain, api_key):
    url = f'https://www.virustotal.com/vtapi/v2/domain/report?apikey={api_key}&domain={domain}'
    response = requests.get(url)
    data = response.json()
    subdomains = data.get('subdomains', [])
    return subdomains
if __name__ == '__main__':
    api_key = 'your_api_key_here'
    domain = 'example.com'
    subdomains = get_subdomains(domain, api_key)
    for subdomain in subdomains:
        print(subdomain)

编写脚本进行子域名爆破

子域名爆破是一种通过字典进行枚举的方法，能够发现大量的子域名。我们可以编写Python脚本来实现子域名爆破，通过不断地尝试字典中的词汇来查找子域名。

1、准备字典文件

首先需要准备一个包含常用子域名词汇的字典文件，例如subdomains.txt，内容如下：

www mail ftp blog test

2、编写爆破脚本

下面是一个简单的子域名爆破脚本，使用Python的requests库来验证子域名是否存在：

import requests
def brute_force_subdomains(domain, dict_file):
    with open(dict_file, 'r') as file:
        subdomains = file.readlines()
    valid_subdomains = []
    for subdomain in subdomains:
        subdomain = subdomain.strip()
        url = f'http://{subdomain}.{domain}'
        try:
            response = requests.get(url, timeout=3)
            if response.status_code == 200:
                valid_subdomains.append(f'{subdomain}.{domain}')
        except requests.RequestException:
            pass
    return valid_subdomains
if __name__ == '__main__':
    domain = 'example.com'
    dict_file = 'subdomains.txt'
    subdomains = brute_force_subdomains(domain, dict_file)
    for subdomain in subdomains:
        print(subdomain)

在这个脚本中，我们从字典文件中读取子域名词汇，逐个尝试拼接成完整的子域名URL，并通过发送HTTP请求来验证子域名是否存在。如果请求成功，则认为子域名有效。

利用网络爬虫技术进行子域名发现

网络爬虫是一种自动化的网页抓取技术，可以用来发现网站的子域名。我们可以编写Python爬虫，从目标网站的页面中提取子域名信息。

1、使用BeautifulSoup解析网页

BeautifulSoup是一个强大的网页解析库，可以方便地提取网页中的信息。我们可以使用BeautifulSoup编写爬虫，从目标网站的页面中提取子域名。

import requests
from bs4 import BeautifulSoup
def crawl_subdomains(domain):
    url = f'http://{domain}'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    subdomains = set()
    for link in soup.find_all('a'):
        href = link.get('href')
        if href and domain in href:
            subdomain = href.split('//')[1].split('/')[0]
            subdomains.add(subdomain)
    return list(subdomains)
if __name__ == '__main__':
    domain = 'example.com'
    subdomains = crawl_subdomains(domain)
    for subdomain in subdomains:
        print(subdomain)

在这个脚本中，我们通过发送HTTP请求获取目标网站的页面内容，并使用BeautifulSoup解析HTML，提取所有包含目标域名的链接，从中提取出子域名。

结合多种方法进行子域名收集

为了提高子域名收集的全面性和准确性，我们可以结合多种方法进行子域名收集。例如，可以先使用第三方库和API接口获取初步的子域名列表，再通过子域名爆破和网络爬虫进行补充和验证。

1、综合收集脚本

下面是一个综合使用多种方法的子域名收集脚本：

import sublist3r
import requests
from bs4 import BeautifulSoup
def collect_subdomains(domain):
    # 使用Sublist3r收集子域名
    subdomains = set(sublist3r.main(domain, 40, output=None, ports=None, silent=True, verbose=False, enable_bruteforce=False, engines=None))
    # 使用VirusTotal API收集子域名
    api_key = 'your_api_key_here'
    url = f'https://www.virustotal.com/vtapi/v2/domain/report?apikey={api_key}&domain={domain}'
    response = requests.get(url)
    data = response.json()
    subdomains.update(data.get('subdomains', []))
    # 使用字典爆破子域名
    dict_file = 'subdomains.txt'
    with open(dict_file, 'r') as file:
        for subdomain in file.readlines():
            subdomain = subdomain.strip()
            url = f'http://{subdomain}.{domain}'
            try:
                response = requests.get(url, timeout=3)
                if response.status_code == 200:
                    subdomains.add(f'{subdomain}.{domain}')
            except requests.RequestException:
                pass
    # 使用爬虫提取子域名
    url = f'http://{domain}'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    for link in soup.find_all('a'):
        href = link.get('href')
        if href and domain in href:
            subdomain = href.split('//')[1].split('/')[0]
            subdomains.add(subdomain)
    return list(subdomains)
if __name__ == '__main__':
    domain = 'example.com'
    subdomains = collect_subdomains(domain)
    for subdomain in subdomains:
        print(subdomain)