扫描二维码下载沐宇APP

沐宇

微信扫码使用沐宇小程序

沐宇

Scrapy鎬庝箞瀹炵幇鏁版嵁缂撳瓨鍜屾寔涔呭寲

扬州沐宇科技
2024-05-14 11:56:20
scrapy

Scrapy鎻愪緵浜嗗绉嶆柟寮忔潵瀹炵幇鏁版嵁缂撳瓨鍜屾寔涔呭寲锛屽叾涓寘鎷細

  1. 浣跨敤鍐呯疆鐨凢eed杈撳嚭锛歋crapy鍐呯疆浜嗗绉岶eed鏍煎紡锛堝JSON銆丆SV銆乆ML绛夛級锛屽彲浠ュ皢鐖彇鍒扮殑鏁版嵁鍐欏叆鍒版湰鍦版枃浠朵腑锛屽疄鐜版暟鎹寔涔呭寲銆?/li>
# 鍦╯ettings.py涓厤缃瓼eed杈撳嚭
FEED_FORMAT = 'json'
FEED_URI = 'output.json'
  1. 浣跨敤鍐呯疆鐨処tem Pipeline锛氬彲浠ョ紪鍐欒嚜瀹氫箟鐨処tem Pipeline锛屽湪鐖彇杩囩▼涓鏁版嵁杩涜澶勭悊鍜屽瓨鍌ㄣ€傞€氳繃瀹炵幇process_item()鏂规硶鍙互灏嗙埇鍙栧埌鐨勬暟鎹繚瀛樺埌鏁版嵁搴撴垨鍏朵粬瀛樺偍浠嬭川涓€?/li>
# 缂栧啓鑷畾涔夌殑Item Pipeline
class MyPipeline:
    def process_item(self, item, spider):
        # 灏唅tem鏁版嵁淇濆瓨鍒版暟鎹簱涓?/span>
        return item

# 鍦╯ettings.py涓惎鐢ㄨPipeline
ITEM_PIPELINES = {
   'myproject.pipelines.MyPipeline': 300,
}
  1. 浣跨敤绗笁鏂瑰瓨鍌ㄥ簱锛歋crapy杩樺彲浠ヤ笌绗笁鏂瑰瓨鍌ㄥ簱锛堝MongoDB銆丮ySQL绛夛級缁撳悎浣跨敤锛屽皢鐖彇鍒扮殑鏁版嵁淇濆瓨鍒版暟鎹簱涓€?/li>
# 瀹夎绗笁鏂瑰瓨鍌ㄥ簱
pip install pymongo

# 鍦╯ettings.py涓厤缃甅ongoDB瀛樺偍
MONGO_URI = 'mongodb://localhost:27017'
MONGO_DATABASE = 'mydatabase'

# 缂栧啓鑷畾涔夌殑Item Pipeline
import pymongo

class MongoPipeline:
    def open_spider(self, spider):
        self.client = pymongo.MongoClient(settings.MONGO_URI)
        self.db = self.client[settings.MONGO_DATABASE]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.db[spider.name].insert_one(dict(item))
        return item

# 鍦╯ettings.py涓惎鐢ㄨPipeline
ITEM_PIPELINES = {
   'myproject.pipelines.MongoPipeline': 300,
}

閫氳繃浠ヤ笂鏂瑰紡锛屽彲浠ュ湪Scrapy涓疄鐜版暟鎹紦瀛樺拰鎸佷箙鍖栵紝纭繚鐖彇鍒扮殑鏁版嵁涓嶄細涓㈠け銆?/p>

扫码添加客服微信