我試圖根據作為輸入的關鍵字從
如何從 python 中的 js 物件獲取文章及其鏈接?
uj5u.com熱心網友回復:
response.text給你字串,如果你/*O_o*/\ngoogle.search.cse.api12760(在開頭洗掉,);最后你將有正常的JSON,你可以使用它轉換為 Python 字典json.loads()- 然后你可以用來[key]從字典中獲取資料。
最小作業示例
import requests
import json
params = (
('rsz', 'filtered_cse'),
('num', '10'),
('hl', 'en'),
('source', 'gcsc'),
('gss', '.com'),
('cselibv', 'cc267ab8871224bd'),
('cx', '000299513257099441687:fkkgoogvtaw'),
('q', 'multi-label text classification'),
('safe', 'off'),
('cse_tok', 'AJvRUv1dd6NHqw5GKAoRSg3lLILE:1636278007905'),
('sort', ''),
('exp', 'csqr,cc,4618906'),
('callback', 'google.search.cse.api12760'),
)
response = requests.get('https://cse.google.com/cse/element/v1', params=params)
start = len('''/*O_o*/
google.search.cse.api12760(''')
end = len(');')
text = response.text[start:-end]
data = json.loads(text)
#print(data)
for item in data['results']:
#print('keys:', item.keys())
print('title:', item['title'])
print('url:', item['url'])
#print('content:', item['content'])
#print('title:', item['titleNoFormatting'])
#meta = item['richSnippet']['metatags']
#if 'author' in meta:
# print('author:', meta['author'])
print('---')
結果:
title: Large-Scale <b>Multi</b>-<b>Label Text Classification</b> on EU Legislation - ACL ...
url: https://www.aclweb.org/anthology/P19-1636/
---
title: <b>Label</b>-Specific Document Representation for <b>Multi</b>-<b>Label Text</b> ...
url: https://www.aclweb.org/anthology/D19-1044/
---
title: Initializing neural networks for hierarchical <b>multi</b>-<b>label text</b> ...
url: https://www.aclweb.org/anthology/W17-2339
---
title: TaxoClass: Hierarchical <b>Multi</b>-<b>Label Text Classification</b> Using Only ...
url: https://www.aclweb.org/anthology/2021.naacl-main.335/
---
title: NeuralClassifier: An Open-source Neural Hierarchical <b>Multi</b>-<b>label</b> ...
url: https://www.aclweb.org/anthology/P19-3015/
---
title: Extreme <b>Multi</b>-<b>Label</b> Legal <b>Text Classification</b>: A Case Study in EU ...
url: https://www.aclweb.org/anthology/W19-2209
---
title: Hierarchical Transfer Learning for <b>Multi</b>-<b>label Text Classification</b> ...
url: https://www.aclweb.org/anthology/P19-1633/
---
title: Global Model for Hierarchical <b>Multi</b>-<b>Label Text Classification</b> - ACL ...
url: https://www.aclweb.org/anthology/I13-1006
---
title: Hierarchical <b>Multi</b>-<b>label Classification</b> of <b>Text</b> with Capsule Networks ...
url: https://www.aclweb.org/anthology/P19-2045
---
title: Improving Pretrained Models for Zero-shot <b>Multi</b>-<b>label Text</b> ...
url: https://www.aclweb.org/anthology/2021.naacl-main.83.pdf
---
順便提一句:
如果你顯示,item.keys()那么你應該看到你還能得到什么:
'cacheUrl', 'clicktrackUrl', 'content', 'contentNoFormatting',
'title', 'titleNoFormatting', 'formattedUrl', 'unescapedUrl', 'url',
'visibleUrl', 'richSnippet', 'breadcrumbUrl'
或者你可以使用 for-loop 來顯示所有的鍵和值
for item in data['results']:
for key, value in item.items():
print(f'{key}: {value}')
print('---')
print('===================================')
其中一些可能有子詞典 - 比如 item['richSnippet']['metatags']['author']
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/354328.html
上一篇:ScrapyXpath回傳空串列
