python " />

扫描二维码下载沐宇APP

沐宇

微信扫码使用沐宇小程序

沐宇

鎬庝箞鐢╬ython鐖彇灏忚鍐呭

扬州沐宇科技
2023-09-12 09:11:03
python

浣跨敤Python鐖彇灏忚鍐呭锛屽彲浠ヤ娇鐢╮equests搴撳彂閫丠TTP璇锋眰鑾峰彇灏忚缃戠珯鐨凥TML鍐呭锛岀劧鍚庝娇鐢˙eautifulSoup搴撹В鏋怘TML锛屽苟鎻愬彇鍑哄皬璇寸殑绔犺妭閾炬帴銆傚啀娆′娇鐢╮equests搴撳彂閫丠TTP璇锋眰鑾峰彇姣忎釜绔犺妭鐨凥TML鍐呭锛屾渶鍚庝娇鐢ㄦ鍒欒〃杈惧紡鎴栬€匓eautifulSoup搴撴彁鍙栧嚭绔犺妭鐨勫叿浣撳唴瀹广€?br/>
涓嬮潰鏄竴涓畝鍗曠殑绀轰緥浠g爜锛?br/>
```python
import requests
from bs4 import BeautifulSoup
import re

def get_novel_content(url):
   # 鍙戦€丠TTP璇锋眰鑾峰彇缃戦〉鍐呭
   response = requests.get(url)
   response.encoding = 'utf-8'
   html = response.text
   
   # 浣跨敤BeautifulSoup瑙f瀽HTML
   soup = BeautifulSoup(html, 'html.parser')
   
   # 鎻愬彇灏忚绔犺妭閾炬帴
   chapter_links = soup.find_all('a', href=re.compile("chapter"))
   
   # 閫愪釜绔犺妭鐖彇鍐呭
   for link in chapter_links:
       chapter_url = url + link['href']  # 鎷兼帴瀹屾暣鐨勭珷鑺傞摼鎺?br/>        
       # 鍙戦€丠TTP璇锋眰鑾峰彇绔犺妭鍐呭
       chapter_response = requests.get(chapter_url)
       chapter_response.encoding = 'utf-8'
       chapter_html = chapter_response.text
       
       # 浣跨敤姝e垯琛ㄨ揪寮忔彁鍙栫珷鑺傛爣棰樺拰鍐呭
       chapter_title = re.search('

(.*?)

', chapter_html).group(1)
       chapter_content = re.search('

(.*?)

', chapter_html, re.S).group(1)
       
       # 鎵撳嵃绔犺妭鏍囬鍜屽唴瀹?br/>        print(chapter_title)
       print(chapter_content)
       print('------------------------------')

# 绀轰緥锛氱埇鍙栥€婃枟鐮磋媿绌广€嬪皬璇?br/>novel_url = 'http://www.xxxx.com/'  # 灏忚缃戠珯鐨刄RL
get_novel_content(novel_url)
```

闇€瑕佹敞鎰忕殑鏄紝鍏蜂綋鐖彇灏忚鍐呭鐨勪唬鐮佷細鍥犱笉鍚岀殑灏忚缃戠珯鑰屾湁鎵€涓嶅悓锛岄渶瑕佹牴鎹洰鏍囩綉绔欑殑HTML缁撴瀯杩涜鐩稿簲鐨勮皟鏁淬€傚彟澶栵紝鐖彇缃戠珯鍐呭鏃堕渶瑕侀伒瀹堢浉鍏虫硶寰嬫硶瑙勫拰缃戠珯鐨勭埇铏鍒欙紝閬垮厤瀵圭洰鏍囩綉绔欓€犳垚杩囧ぇ鐨勮闂帇鍔涖€?/p>

扫码添加客服微信