濡備綍浣跨敤Scrapy杩涜鏁版嵁缂撳瓨
Scrapy鎻愪緵浜嗕竴涓唴缃殑缂撳瓨绯荤粺锛屽彲浠ュ湪涓嬭浇鏁版嵁涔嬪悗灏嗗叾淇濆瓨鍦ㄦ湰鍦版枃浠剁郴缁熶腑銆傝繖鏍峰彲浠ラ伩鍏嶉噸澶嶄笅杞界浉鍚岀殑鏁版嵁锛岃妭鐪佸甫瀹藉拰鏃堕棿銆備互涓嬫槸濡備綍浣跨敤Scrapy杩涜鏁版嵁缂撳瓨鐨勬楠わ細
- 鍦╯ettings.py鏂囦欢涓缃紦瀛樼浉鍏崇殑鍙傛暟锛?/li>
# 鍚敤缂撳瓨
HTTPCACHE_ENABLED = True
# 缂撳瓨璺緞
HTTPCACHE_DIR = 'httpcache'
# 缂撳瓨杩囨湡鏃堕棿锛堢锛?/span>
HTTPCACHE_EXPIRATION_SECS = 0
- 鍦╯piders涓惎鐢ㄧ紦瀛橈細
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['http://example.com']
def parse(self, response):
# 鍦╬arse鏂规硶涓皟鐢ㄧ紦瀛?/span>
for item in response.css('div.item'):
yield {
'title': item.css('a::text').get(),
'link': item.css('a::attr(href)').get()
}
- 杩愯Scrapy鐖櫕鏃讹紝鏁版嵁灏嗚嚜鍔ㄧ紦瀛樺埌鎸囧畾鐨勮矾寰勪腑銆傚鏋滈渶瑕佹洿鏀圭紦瀛樼瓥鐣ユ垨娓呴櫎缂撳瓨锛屽彲浠ュ湪鍛戒护琛屼腑浣跨敤浠ヤ笅鍛戒护锛?/li>
scrapy crawl myspider -s HTTPCACHE_ENABLED=True
scrapy crawl myspider -s HTTPCACHE_EXPIRATION_SECS=3600
scrapy crawl myspider --delete
閫氳繃浠ヤ笂姝ラ锛屾偍鍙互浣跨敤Scrapy杩涜鏁版嵁缂撳瓨锛屾彁楂樼埇鍙栨晥鐜囧苟鑺傜渷璧勬簮銆?/p>
相关问答