鎬庝箞鐢╬ython鐖彇灏忚鍐呭
浣跨敤Python鐖彇灏忚鍐呭锛屽彲浠ヤ娇鐢╮equests搴撳彂閫丠TTP璇锋眰鑾峰彇灏忚缃戠珯鐨凥TML鍐呭锛岀劧鍚庝娇鐢˙eautifulSoup搴撹В鏋怘TML锛屽苟鎻愬彇鍑哄皬璇寸殑绔犺妭閾炬帴銆傚啀娆′娇鐢╮equests搴撳彂閫丠TTP璇锋眰鑾峰彇姣忎釜绔犺妭鐨凥TML鍐呭锛屾渶鍚庝娇鐢ㄦ鍒欒〃杈惧紡鎴栬€匓eautifulSoup搴撴彁鍙栧嚭绔犺妭鐨勫叿浣撳唴瀹广€?br/>
涓嬮潰鏄竴涓畝鍗曠殑绀轰緥浠g爜锛?br/>
```python
import requests
from bs4 import BeautifulSoup
import re
def get_novel_content(url):
# 鍙戦€丠TTP璇锋眰鑾峰彇缃戦〉鍐呭
response = requests.get(url)
response.encoding = 'utf-8'
html = response.text
# 浣跨敤BeautifulSoup瑙f瀽HTML
soup = BeautifulSoup(html, 'html.parser')
# 鎻愬彇灏忚绔犺妭閾炬帴
chapter_links = soup.find_all('a', href=re.compile("chapter"))
# 閫愪釜绔犺妭鐖彇鍐呭
for link in chapter_links:
chapter_url = url + link['href'] # 鎷兼帴瀹屾暣鐨勭珷鑺傞摼鎺?br/>
# 鍙戦€丠TTP璇锋眰鑾峰彇绔犺妭鍐呭
chapter_response = requests.get(chapter_url)
chapter_response.encoding = 'utf-8'
chapter_html = chapter_response.text
# 浣跨敤姝e垯琛ㄨ揪寮忔彁鍙栫珷鑺傛爣棰樺拰鍐呭
chapter_title = re.search('
(.*?)
', chapter_html).group(1)
chapter_content = re.search('
(.*?)
', chapter_html, re.S).group(1)
# 鎵撳嵃绔犺妭鏍囬鍜屽唴瀹?br/>
print(chapter_title)
print(chapter_content)
print('------------------------------')
# 绀轰緥锛氱埇鍙栥€婃枟鐮磋媿绌广€嬪皬璇?br/>novel_url = 'http://www.xxxx.com/' # 灏忚缃戠珯鐨刄RL
get_novel_content(novel_url)
```
闇€瑕佹敞鎰忕殑鏄紝鍏蜂綋鐖彇灏忚鍐呭鐨勪唬鐮佷細鍥犱笉鍚岀殑灏忚缃戠珯鑰屾湁鎵€涓嶅悓锛岄渶瑕佹牴鎹洰鏍囩綉绔欑殑HTML缁撴瀯杩涜鐩稿簲鐨勮皟鏁淬€傚彟澶栵紝鐖彇缃戠珯鍐呭鏃堕渶瑕侀伒瀹堢浉鍏虫硶寰嬫硶瑙勫拰缃戠珯鐨勭埇铏鍒欙紝閬垮厤瀵圭洰鏍囩綉绔欓€犳垚杩囧ぇ鐨勮闂帇鍔涖€?/p>
相关问答