Bilibili API评论数据抓取实战:5个高效技巧与错误处理完整指南

张开发
2026/5/5 1:22:30 15 分钟阅读
Bilibili API评论数据抓取实战:5个高效技巧与错误处理完整指南
Bilibili API评论数据抓取实战5个高效技巧与错误处理完整指南【免费下载链接】bilibili-api哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址https://github.com/MoyuScript/bilibili-api项目地址: https://gitcode.com/gh_mirrors/bi/bilibili-apiBilibili API作为Python开发者获取B站数据的关键工具提供了丰富的评论数据接口。本文深入解析评论接口的高级应用从基础调用到实战优化帮助开发者构建稳定可靠的数据抓取系统。掌握Bilibili API评论接口调用技巧不仅能提升数据获取效率还能有效应对各种异常情况。评论数据抓取的核心挑战与解决方案在实际开发中B站评论数据抓取面临三大核心挑战接口稳定性、认证机制复杂性和分页数据完整性。通过bilibili-api库的现代化接口设计我们可以系统性地解决这些问题。1. 新旧接口对比为什么选择get_comments_lazyBilibili API提供了两套评论获取接口开发者需要根据场景做出明智选择传统接口get_comments适用于历史项目兼容分页基于页码逻辑简单易触发403错误稳定性较差逐步被官方废弃现代接口get_comments_lazy基于游标的分页机制支持增量获取更高的稳定性和成功率推荐用于所有新项目开发支持多种资源类型枚举from bilibili_api import comment, Credential, sync async def fetch_video_comments_lazy(video_aid: int): 使用新版懒加载接口获取视频评论 credential Credential( sessdata你的sessdata, bili_jct你的bili_jct ) all_comments [] offset while True: try: response await comment.get_comments_lazy( oidvideo_aid, type_comment.CommentResourceType.VIDEO, offsetoffset, credentialcredential ) # 处理当前页评论 replies response.get(replies, []) all_comments.extend(replies) # 获取下一页偏移量 cursor response.get(cursor, {}) pagination cursor.get(pagination_reply, {}) offset pagination.get(next_offset, ) # 判断是否结束 if not offset or cursor.get(is_end, False): break except Exception as e: print(f获取评论异常: {e}) break return all_comments # 示例获取视频AV170001的所有评论 comments sync(fetch_video_comments_lazy(170001))2. 资源类型精准匹配避免404错误的秘诀评论接口的核心参数是资源IDoid和资源类型type_错误匹配是导致404错误的常见原因。Bilibili API提供了完整的资源类型枚举from bilibili_api.comment import CommentResourceType # 查看所有支持的资源类型 print(CommentResourceType.__members__) # 输出示例 # {VIDEO: CommentResourceType.VIDEO: 1, # ARTICLE: CommentResourceType.ARTICLE: 12, # DYNAMIC_DRAW: CommentResourceType.DYNAMIC_DRAW: 11, # DYNAMIC: CommentResourceType.DYNAMIC: 17, # AUDIO: CommentResourceType.AUDIO: 14, # AUDIO_LIST: CommentResourceType.AUDIO_LIST: 19, # CHEESE: CommentResourceType.CHEESE: 20, # BLACK_ROOM: CommentResourceType.BLACK_ROOM: 21, # MANGA: CommentResourceType.MANGA: 22, # ACTIVITY: CommentResourceType.ACTIVITY: 23}资源ID获取方法示例from bilibili_api import video, article, dynamic # 视频AV号获取 video_info await video.get_video_info(bvidBV1xx411c7mD) video_aid video_info[aid] # 专栏cv号获取 article_info await article.get_article_info(cvid9762979) article_cvid article_info[id] # 动态rid获取 dynamic_info await dynamic.get_dynamic_detail(dynamic_id116859542) dynamic_rid await dynamic.get_rid(dynamic_id116859542)3. 认证机制深度解析从匿名访问到高级权限Bilibili API支持多种认证级别合理选择认证策略是保证数据获取成功的关键匿名访问无认证优点无需Cookie快速简单限制只能获取公开数据评论数量受限适用场景公开视频的少量评论获取基础认证仅sessdatafrom bilibili_api import Credential credential Credential(sessdata你的sessdata) # 可获取登录用户可见的评论数据完整认证sessdata bili_jctcredential Credential( sessdata你的sessdata, bili_jct你的bili_jct, buvid3你的buvid3 ) # 支持点赞、回复、删除等交互操作认证信息获取路径浏览器登录B站后按F12打开开发者工具进入Application → Cookies → https://bilibili.com复制SESSDATA和bili_jct的值保存到配置文件或环境变量4. 错误处理完整流程从403到网络异常Bilibili API调用中的错误处理需要系统化策略bilibili_api/exceptions/模块提供了完整的异常体系from bilibili_api.exceptions import ( ResponseCodeException, NetworkException, ApiException ) from tenacity import retry, stop_after_attempt, wait_exponential retry( stopstop_after_attempt(3), waitwait_exponential(multiplier1, min2, max10), retryretry_if_exception_type((NetworkException, ResponseCodeException)) ) async def robust_comment_fetch(oid: int, type_: CommentResourceType, credentialNone): 带重试机制的评论获取函数 try: response await comment.get_comments_lazy( oidoid, type_type_, credentialcredential ) return response except ResponseCodeException as e: # 处理API返回的错误码 if e.code -403: print(权限不足请检查认证信息) raise elif e.code -404: print(资源不存在检查oid和type_匹配) raise elif e.code 10003: print(请求频率超限等待后重试) raise else: print(f未知错误码: {e.code}, 消息: {e.msg}) raise except NetworkException as e: print(f网络异常: {e}) raise except Exception as e: print(f未知异常: {e}) raise # 使用示例 try: comments await robust_comment_fetch( oid170001, type_CommentResourceType.VIDEO, credentialcredential ) except Exception as e: print(f最终获取失败: {e})5. 性能优化实战异步并发与缓存策略大规模评论数据抓取需要优化性能bilibili-api提供了多种优化工具异步并发控制import asyncio from typing import List from bilibili_api import comment async def batch_fetch_comments( video_ids: List[int], max_concurrent: int 5 ) - List[dict]: 批量获取多个视频的评论数据 semaphore asyncio.Semaphore(max_concurrent) async def fetch_one(video_id: int) - dict: async with semaphore: return await comment.get_comments_lazy( oidvideo_id, type_comment.CommentResourceType.VIDEO ) tasks [fetch_one(vid) for vid in video_ids] results await asyncio.gather(*tasks, return_exceptionsTrue) # 处理异常结果 valid_results [] for i, result in enumerate(results): if isinstance(result, Exception): print(f视频{video_ids[i]}获取失败: {result}) else: valid_results.append(result) return valid_results # 连接池配置AioHTTPClient from bilibili_api.clients import AioHTTPClient # 初始化连接池提高复用效率 AioHTTPClient.init_pool(limit10, ttl_dns_cache300)数据缓存策略from bilibili_api.utils.cache_pool import CachePool import hashlib import json class CommentCache: def __init__(self, cache_timeout: int 3600): self.cache CachePool() self.timeout cache_timeout def _get_cache_key(self, oid: int, type_: CommentResourceType, offset: str) - str: 生成缓存键 key_data f{oid}_{type_.value}_{offset} return hashlib.md5(key_data.encode()).hexdigest() async def get_comments_with_cache(self, oid: int, type_: CommentResourceType, offset: str ): 带缓存的评论获取 cache_key self._get_cache_key(oid, type_, offset) # 尝试从缓存获取 cached self.cache.get(cache_key) if cached: return json.loads(cached) # 缓存未命中调用API result await comment.get_comments_lazy( oidoid, type_type_, offsetoffset ) # 存入缓存 self.cache.set(cache_key, json.dumps(result), self.timeout) return result # 使用缓存 cache_manager CommentCache() comments await cache_manager.get_comments_with_cache( oid170001, type_CommentResourceType.VIDEO )实战案例构建评论数据监控系统结合以上技巧我们可以构建一个完整的评论数据监控系统import asyncio import time from datetime import datetime from typing import Dict, List from bilibili_api import comment, Credential class CommentMonitor: def __init__(self, credential: Credential None): self.credential credential self.monitored_videos {} async def monitor_video_comments(self, video_aid: int, interval: int 300): 监控指定视频的评论变化 last_comments set() while True: try: # 获取最新评论 response await comment.get_comments_lazy( oidvideo_aid, type_comment.CommentResourceType.VIDEO, credentialself.credential ) current_comments { (r[rpid], r[content][message]) for r in response.get(replies, []) } # 检测新评论 new_comments current_comments - last_comments if new_comments: print(f[{datetime.now()}] 视频{video_aid}发现{len(new_comments)}条新评论) for rpid, message in new_comments: print(f 评论ID: {rpid}, 内容: {message[:50]}...) last_comments current_comments except Exception as e: print(f[{datetime.now()}] 监控异常: {e}) await asyncio.sleep(interval) async def batch_monitor(self, video_ids: List[int], interval: int 600): 批量监控多个视频 tasks [ self.monitor_video_comments(vid, interval) for vid in video_ids ] await asyncio.gather(*tasks) # 使用示例 async def main(): credential Credential( sessdata你的sessdata, bili_jct你的bili_jct ) monitor CommentMonitor(credential) # 监控热门视频 hot_videos [170001, 418788911, 123456789] await monitor.batch_monitor(hot_videos, interval300) # 启动监控 asyncio.run(main())进阶技巧评论数据分析与处理获取评论数据后我们还可以进行深度分析import json from collections import Counter from typing import List, Dict class CommentAnalyzer: def __init__(self, comments: List[Dict]): self.comments comments def get_top_users(self, top_n: int 10): 获取评论最多的用户 user_counter Counter() for comment in self.comments: user comment[member][uname] user_counter[user] 1 return user_counter.most_common(top_n) def analyze_sentiment_patterns(self): 简单情感分析基于关键词 positive_keywords [好, 赞, 支持, 喜欢, 棒] negative_keywords [差, 烂, 不喜欢, 垃圾, 失望] positive_count 0 negative_count 0 for comment in self.comments: message comment[content][message].lower() if any(keyword in message for keyword in positive_keywords): positive_count 1 elif any(keyword in message for keyword in negative_keywords): negative_count 1 total len(self.comments) return { total_comments: total, positive_rate: positive_count / total if total 0 else 0, negative_rate: negative_count / total if total 0 else 0, neutral_rate: (total - positive_count - negative_count) / total if total 0 else 0 } def export_to_json(self, filename: str): 导出评论数据到JSON文件 with open(filename, w, encodingutf-8) as f: json.dump(self.comments, f, ensure_asciiFalse, indent2) print(f评论数据已导出到 {filename}) # 使用示例 analyzer CommentAnalyzer(comments) top_users analyzer.get_top_users(5) sentiment analyzer.analyze_sentiment_patterns() analyzer.export_to_json(comments_analysis.json)总结与最佳实践通过本文的实战指南你已经掌握了Bilibili API评论接口的高级应用技巧。记住以下关键要点接口选择始终优先使用get_comments_lazy接口它提供更好的稳定性和分页支持认证策略根据需求选择合适的认证级别生产环境建议使用完整Cookie认证错误处理实现完整的异常处理机制特别是对403、404和频率限制的处理性能优化使用异步并发和缓存策略提升数据获取效率资源匹配确保oid和type_参数正确匹配避免404错误Bilibili API的评论接口为开发者提供了强大的数据获取能力结合Python的异步生态可以构建高效、稳定的数据采集系统。无论是内容分析、用户研究还是社区监控这些技巧都能帮助你更好地利用B站的评论数据。进一步学习资源官方示例文档docs/examples/comment.md核心模块源码bilibili_api/comment.py异常处理模块bilibili_api/exceptions/工具函数目录utils/掌握这些技巧后你可以更自信地处理B站评论数据构建更强大的数据分析应用。记住合理控制请求频率、正确配置认证信息是保证长期稳定运行的关键。【免费下载链接】bilibili-api哔哩哔哩常用API调用。支持视频、番剧、用户、频道、音频等功能。原仓库地址https://github.com/MoyuScript/bilibili-api项目地址: https://gitcode.com/gh_mirrors/bi/bilibili-api创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

更多文章