用python如何爬取股票

用Python爬取股票数据的方法有很多，如使用第三方库、通过API获取、或者直接爬取网页数据。推荐使用API获取数据，因为这种方法更加稳定、可靠且易于维护。常用的API包括Alpha Vantage、Yahoo Finance和Quandl等。本文将详细介绍如何使用Python爬取股票数据，重点介绍通过API获取数据的方法。

一、使用Alpha Vantage API

Alpha Vantage提供丰富的金融数据API，使用方便且有免费套餐。

1. 注册并获取API Key

首先，访问Alpha Vantage官网（https://www.alphavantage.co/），注册一个账号并获取API Key。

2. 安装所需Python库

pip install requests pandas

3. 编写代码获取数据

import requests
import pandas as pd
def get_stock_data(symbol, api_key, interval='1min'):
    url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol={symbol}&interval={interval}&apikey={api_key}'
    response = requests.get(url)
    data = response.json()
    df = pd.DataFrame(data['Time Series (1min)']).T
    df.columns = ['open', 'high', 'low', 'close', 'volume']
    df.index = pd.to_datetime(df.index)
    return df
api_key = 'YOUR_API_KEY'
symbol = 'AAPL'
df = get_stock_data(symbol, api_key)
print(df.head())

二、使用Yahoo Finance API

Yahoo Finance也是一个非常受欢迎的数据源，可以通过yfinance库获取股票数据。

1. 安装yfinance库

pip install yfinance

2. 编写代码获取数据

import yfinance as yf
def get_stock_data(symbol, start, end):
    stock = yf.Ticker(symbol)
    df = stock.history(start=start, end=end)
    return df
symbol = 'AAPL'
start = '2020-01-01'
end = '2023-01-01'
df = get_stock_data(symbol, start, end)
print(df.head())

三、使用Quandl API

Quandl提供各种金融数据集，其API也非常强大。

1. 注册并获取API Key

首先，访问Quandl官网（https://www.quandl.com/），注册一个账号并获取API Key。

2. 安装quandl库

pip install quandl

3. 编写代码获取数据

import quandl
def get_stock_data(symbol, api_key, start, end):
    quandl.ApiConfig.api_key = api_key
    df = quandl.get(f"WIKI/{symbol}", start_date=start, end_date=end)
    return df
api_key = 'YOUR_API_KEY'
symbol = 'AAPL'
start = '2020-01-01'
end = '2023-01-01'
df = get_stock_data(symbol, api_key, start, end)
print(df.head())

四、直接爬取网页数据

如果API不能满足需求，可以直接爬取网页数据。以爬取新浪财经为例。

1. 安装所需Python库

pip install requests beautifulsoup4

2. 编写代码爬取数据

import requests
from bs4 import BeautifulSoup
import pandas as pd
def get_stock_data(symbol):
    url = f'http://finance.sina.com.cn/realstock/company/{symbol}/nc.shtml'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    table = soup.find('table', id='FundHoldSharesTable')
    rows = table.find_all('tr')
    data = []
    for row in rows[1:]:
        cols = row.find_all('td')
        cols = [col.text.strip() for col in cols]
        data.append(cols)
    df = pd.DataFrame(data, columns=['Date', 'Open', 'High', 'Low', 'Close', 'Volume'])
    return df
symbol = 'sh600519'
df = get_stock_data(symbol)
print(df.head())

五、数据清洗与分析

获取到数据后，通常需要进行数据清洗和分析。以下是一些常用的操作：

1. 数据清洗

数据清洗包括处理缺失值、重复值和异常值。

# 处理缺失值
df = df.dropna()
处理重复值
df = df.drop_duplicates()
处理异常值
df = df[(df['Close'] > 0) & (df['Volume'] > 0)]

2. 数据可视化

数据可视化有助于更直观地理解数据。可以使用matplotlib和seaborn库进行可视化。

import matplotlib.pyplot as plt
import seaborn as sns
设置图形风格
sns.set(style='whitegrid')
绘制收盘价趋势图
plt.figure(figsize=(14, 7))
plt.plot(df['Close'])
plt.title('Stock Closing Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
绘制成交量趋势图
plt.figure(figsize=(14, 7))
plt.plot(df['Volume'])
plt.title('Stock Volume')
plt.xlabel('Date')
plt.ylabel('Volume')
plt.show()

3. 数据分析

数据分析包括计算股票的基本指标，如移动平均线、MACD等。

# 计算移动平均线
df['MA20'] = df['Close'].rolling(window=20).mean()
df['MA50'] = df['Close'].rolling(window=50).mean()
绘制移动平均线
plt.figure(figsize=(14, 7))
plt.plot(df['Close'], label='Close')
plt.plot(df['MA20'], label='MA20')
plt.plot(df['MA50'], label='MA50')
plt.title('Stock Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
计算MACD
df['EMA12'] = df['Close'].ewm(span=12).mean()
df['EMA26'] = df['Close'].ewm(span=26).mean()
df['MACD'] = df['EMA12'] - df['EMA26']
df['Signal'] = df['MACD'].ewm(span=9).mean()
绘制MACD
plt.figure(figsize=(14, 7))
plt.plot(df['MACD'], label='MACD')
plt.plot(df['Signal'], label='Signal')
plt.title('MACD')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()