如何爬取微博全部粉丝python

要爬取微博全部粉丝，您可以使用Python结合微博API和网页爬虫技术来实现。首先，您需要获取微博API的访问权限、其次使用网页爬虫（如Selenium或BeautifulSoup）进行数据提取。下面我们详细探讨如何实现这一目标。

一、获取微博API的访问权限

1. 申请微博开放平台账号

要使用微博API，您需要先在微博开放平台注册一个开发者账号，并创建一个应用。通过这个应用，您可以获取API的访问权限。

2. 获取Access Token

在创建应用后，您需要通过OAuth认证获取Access Token，这是访问微博API的凭证。可以通过以下示例代码获取：

import requests
APP_KEY = 'your_app_key'
APP_SECRET = 'your_app_secret'
CALLBACK_URL = 'your_callback_url'
url = f'https://api.weibo.com/oauth2/authorize?client_id={APP_KEY}&redirect_uri={CALLBACK_URL}&response_type=code'
print(f'Please go to this URL and authorize the app: {url}')
After user authorizes the app, they will be redirected to CALLBACK_URL with a code
code = input('Please enter the code you received: ')
token_url = 'https://api.weibo.com/oauth2/access_token'
data = {
    'client_id': APP_KEY,
    'client_secret': APP_SECRET,
    'grant_type': 'authorization_code',
    'redirect_uri': CALLBACK_URL,
    'code': code
}
response = requests.post(token_url, data=data)
access_token = response.json().get('access_token')
print(f'Access Token: {access_token}')

二、使用API获取粉丝信息

1. 调用粉丝列表接口

微博提供了获取用户粉丝列表的API GET /friendships/followers, 您可以通过该接口获取粉丝信息。

import requests
def get_followers(access_token, uid, cursor=0, count=200):
    url = 'https://api.weibo.com/2/friendships/followers.json'
    params = {
        'access_token': access_token,
        'uid': uid,
        'count': count,
        'cursor': cursor
    }
    response = requests.get(url, params=params)
    return response.json()
Example usage
uid = 'target_user_id'
followers = get_followers(access_token, uid)
print(followers)

三、处理分页数据

由于微博API每次返回的粉丝数量有限，您需要处理分页数据，直到获取所有粉丝。

import time
def get_all_followers(access_token, uid):
    followers = []
    cursor = 0
    while True:
        data = get_followers(access_token, uid, cursor)
        if 'users' in data:
            followers.extend(data['users'])
            if 'next_cursor' in data and data['next_cursor'] != 0:
                cursor = data['next_cursor']
            else:
                break
        else:
            break
        time.sleep(1)  # Avoid hitting rate limit
    return followers
Example usage
all_followers = get_all_followers(access_token, uid)
print(f'Total followers: {len(all_followers)}')

四、使用网页爬虫补充数据

有些信息可能无法通过API获取，这时可以使用网页爬虫技术来补充数据。使用Selenium或BeautifulSoup等库可以实现这一目的。

1. 安装依赖

pip install selenium beautifulsoup4

2. 使用Selenium模拟登录

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
def login_weibo(username, password):
    driver = webdriver.Chrome()
    driver.get('https://weibo.com/login.php')
    time.sleep(3)
    driver.find_element(By.ID, 'loginname').send_keys(username)
    driver.find_element(By.NAME, 'password').send_keys(password)
    driver.find_element(By.CSS_SELECTOR, 'a.W_btn_a').click()
    time.sleep(5)  # wait for login to complete
    return driver
Example usage
username = 'your_username'
password = 'your_password'
driver = login_weibo(username, password)

3. 爬取粉丝页面

from bs4 import BeautifulSoup
def get_fans(driver, uid):
    fans = []
    driver.get(f'https://weibo.com/{uid}/fans')
    time.sleep(3)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    fan_elements = soup.find_all('li', {'class': 'follow_item S_line2'})
    for element in fan_elements:
        fan_info = {
            'name': element.find('a', {'class': 'S_txt1'}).text,
            'profile_url': element.find('a', {'class': 'S_txt1'})['href']
        }
        fans.append(fan_info)
    return fans
Example usage
fans = get_fans(driver, uid)
print(fans)

五、数据存储与处理

为了更好地管理和分析粉丝数据，可以将数据存储到数据库中，如MySQL、MongoDB等。

1. 安装数据库依赖

pip install pymysql

2. 存储数据到MySQL

import pymysql
def store_followers_to_db(followers):
    connection = pymysql.connect(
        host='localhost',
        user='your_db_user',
        password='your_db_password',
        db='weibo',
        charset='utf8mb4',
        cursorclass=pymysql.cursors.DictCursor
    )
    try:
        with connection.cursor() as cursor:
            sql = "INSERT INTO followers (name, profile_url) VALUES (%s, %s)"
            for follower in followers:
                cursor.execute(sql, (follower['name'], follower['profile_url']))
        connection.commit()
    finally:
        connection.close()
Example usage
store_followers_to_db(fans)

六、总结

通过上述步骤，您可以使用Python结合微博API和网页爬虫技术，成功获取微博全部粉丝信息，并将其存储到数据库中进行管理和分析。关键步骤包括获取API访问权限、调用粉丝列表API、处理分页数据、使用Selenium补充爬虫数据以及存储数据到数据库。这些步骤确保了您能够完整、系统地获取和管理微博粉丝信息。