python爬蟲萬能代碼 網絡爬蟲軟件有哪些( 二 )

若字段有默認值或者自增,則默認注釋掉,可按需打開 。大家可以看到我這張表的 id 字段在這里被注釋了 。
若item字段過多,不想逐一賦值,可通過如下方式創建:

feapder create -i report 1
這時候生成的實體類是這樣的:
class ReportItem(Item):"""This class was generated by feapder.command: feapder create -i report 1."""__table_name__ = "report 1"def __init__(self, *args, **kwargs):self.count = kwargs.get('count')self.emRatingName = kwargs.get('emRatingName')# 評級名稱self.emRatingValue = https://www.shwenmu.com/wenda/kwargs.get('emRatingValue')# 評級代碼self.encodeUrl = kwargs.get('encodeUrl')# 鏈接# self.id = kwargs.get('id')self.indvInduCode = kwargs.get('indvInduCode')# 行業代碼self.indvInduName = kwargs.get('indvInduName')# 行業名稱self.lastEmRatingName = kwargs.get('lastEmRatingName')# 上次評級名稱self.lastEmRatingValue = https://www.shwenmu.com/wenda/kwargs.get('lastEmRatingValue')# 上次評級代碼self.orgCode = kwargs.get('orgCode')# 機構代碼self.orgName = kwargs.get('orgName')# 機構名稱self.orgSName = kwargs.get('orgSName')# 機構簡稱self.predictNextTwoYearEps = kwargs.get('predictNextTwoYearEps')self.predictNextTwoYearPe = kwargs.get('predictNextTwoYearPe')self.predictNextYearEps = kwargs.get('predictNextYearEps')self.predictNextYearPe = kwargs.get('predictNextYearPe')self.predictThisYearEps = kwargs.get('predictThisYearEps')self.predictThisYearPe = kwargs.get('predictThisYearPe')self.publishDate = kwargs.get('publishDate')# 發表時間self.ratingChange = kwargs.get('ratingChange')# 評級變動self.researcher = kwargs.get('researcher')# 研究員self.stockCode = kwargs.get('stockCode')# 股票代碼self.stockName = kwargs.get('stockName')# 股票簡稱self.title = kwargs.get('title')# 報告名稱這樣當我們請求回來的json數據時,可直接賦值,如:
response_data = https://www.shwenmu.com/wenda/{"title":" 測試"} # 模擬請求回來的數據item = SpiderDataItem(**response_data)想要數據自動入庫也比較簡單,在解析完數據之后,將數據賦值給 Item,然后 yield 就行了:
def parse(self, request, response):html = response.content.decode("utf-8")if len(html):content = html.replace('datatable1351846(', '')[:-1]content_json = json.loads(content)print(content_json)for obj in content_json['data']:result = ReportItem()result['orgName'] = obj['orgName'] #機構名稱result['orgSName'] = obj['orgSName'] #機構簡稱result['publishDate'] = obj['publishDate'] #發布日期result['predictNextTwoYearEps'] = obj['predictNextTwoYearEps'] #后年每股盈利result['title'] = obj['title'] #報告名稱result['stockName'] = obj['stockName'] #股票名稱result['stockCode'] = obj['stockCode'] #股票coderesult['orgCode'] = obj['stockCode'] #機構coderesult['predictNextTwoYearPe'] = obj['predictNextTwoYearPe'] #后年市盈率result['predictNextYearEps'] = obj['predictNextYearEps'] # 明年每股盈利result['predictNextYearPe'] = obj['predictNextYearPe'] # 明年市盈率result['predictThisYearEps'] = obj['predictThisYearEps'] #今年每股盈利result['predictThisYearPe'] = obj['predictThisYearPe'] #今年市盈率result['indvInduCode'] = obj['indvInduCode'] # 行業代碼result['indvInduName'] = obj['indvInduName'] # 行業名稱result['lastEmRatingName'] = obj['lastEmRatingName'] # 上次評級名稱result['lastEmRatingValue'] = obj['lastEmRatingValue'] # 上次評級代碼result['emRatingValue'] = obj['emRatingValue'] # 評級代碼result['emRatingName'] = obj['emRatingName'] # 評級名稱result['ratingChange'] = obj['ratingChange'] # 評級變動result['researcher'] = obj['researcher'] # 研究員result['encodeUrl'] = obj['encodeUrl'] # 鏈接result['count'] = int(obj['count']) # 近一月個股研報數yield result返回item后,item 會流進到框架的 ItemBuffer, ItemBuffer 每.05秒或當item數量積攢到5000個,便會批量將這些 item 批量入庫 。表名為類名去掉 Item 的小寫,如 ReportItem 數據會落入到 report 表 。

推薦閱讀