抓取Leetcode的每日一題資訊
思路一(發送GraphQL Query獲取資料)
參考文章:https://www.cnblogs.com/ZhaoxiCheung/p/9333476.html
介面分析
主要的資料存在于graphql/介面中:
https://leetcode-cn.com/graphql/
首頁熱門題目介面
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-h9MMB12p-1635930652410)(https://raw.githubusercontent.com/Onion224/Images/main/image-20211102110733963.png)]](https://img.uj5u.com/2021/11/05/281160050757272.png)
是否AC狀態查看介面
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-q9DjbTN8-1635930510523)(https://raw.githubusercontent.com/Onion224/Images/main/image-20211102111011935.png)]](https://img.uj5u.com/2021/11/05/281160050757273.png)
每日一題介面
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-yDbqPi7f-1635930510526)(https://raw.githubusercontent.com/Onion224/Images/main/image-20211102111325460.png)]](https://img.uj5u.com/2021/11/05/281160050757274.png)
構造 GraphQL Query來獲取資訊
在Headers下的Request Payload中我們可以看到一個query欄位,這是我們要構造的 GraphQL Query 的一個重要資訊,
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-iWWzhSM2-1635930510538)(https://raw.githubusercontent.com/Onion224/Images/main/image-20211102112407407.png)]](https://img.uj5u.com/2021/11/05/281160050757275.png)
利用Postman來分析介面
我們并不一開始就用代碼來獲取題目資訊,而是先利用 Postman 來看看如何獲取題目資訊,右鍵 Network 下的 graphql 檔案—>Copy—>Copy as cURL(bash)
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-o70pdG2R-1635930510541)(https://raw.githubusercontent.com/Onion224/Images/main/885804-20180719232607953-589650086.png)]](https://img.uj5u.com/2021/11/05/281160050757276.png)
接著我們打開Postman,點擊左上角File里的import,然后找到Raw text欄
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-YaRVKe0n-1635930510544)(https://raw.githubusercontent.com/Onion224/Images/main/image-20211102113235146.png)]](https://img.uj5u.com/2021/11/05/281160050757277.png)
將copy下來的cURL粘貼到Raw text中,點擊continue,就可以在Postman中查看
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-QyAWzGxH-1635930510546)(C:\Users\12526\AppData\Roaming\Typora\typora-user-images\image-20211102114157857.png)]](https://img.uj5u.com/2021/11/05/281160050757278.png)
在這之前遇到了一個小問題,把copy all as cURL看成了copy as cURL,導致在Postman中決議錯誤,
curl決議的結果如下:
![[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-U6cWusjg-1635930510548)(C:\Users\12526\AppData\Roaming\Typora\typora-user-images\image-20211102114326263.png)]](https://img.uj5u.com/2021/11/05/281160050757279.png)
從決議的結果看,和我們在Headers中看到的query欄位類似,不過有一些細節需要更改,
當然,如果不想直接粘貼復制的 cURL,那么我們可以自己在 Postman 中寫 Header 和 Body,需要注意的是這邊的 Content-Type是application/graphql,Body 中的 GraphQL 構造,參照 Request Payload 中的query的欄位來構造

利用Java的Jsoup和okhttp庫來發送http請求和決議Json資料
package com.example.leetcode_card.utils;
import com.alibaba.fastjson.JSONObject;
import okhttp3.*;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import java.io.IOException;
import java.util.Map;
import java.util.Objects;
public class GraphqlUtil {
private static String BASE_URL = "https://leetcode-cn.com";
private static String questionUrl = "https://leetcode-cn.com/problems/two-sum/description/";
private static String GRAPHQL_URL = "https://leetcode-cn.com/graphql";
public GraphqlUtil() {
}
public static String getContent(String title) throws IOException {
Connection.Response response = Jsoup.connect(questionUrl)
.method(Connection.Method.GET)
.execute();
String csrftoken = response.cookie("aliyungf_tc");
String __cfduid = response.cookie("__cfduid");
OkHttpClient client = new OkHttpClient.Builder()
.followRedirects(false)
.followSslRedirects(false)
.build();
String query = "query{ question(titleSlug:\"%s\") { questionId translatedTitle translatedContent difficulty } }";
String postBody = String.format(query,title);
assert csrftoken != null;
Request request = new Request.Builder()
.addHeader("Content-Type","application/graphql")
.addHeader("Referer",questionUrl)
.addHeader("Cookie","__cfduid=" + __cfduid + ";" + "csrftoken=" + csrftoken)
.addHeader("x-csrftoken",csrftoken)
.url(GRAPHQL_URL)
.post(RequestBody.create(MediaType.parse("application/graphql; charset=utf-8"),postBody))
.build();
Response response1 = client.newCall(request).execute();
//由于json的原因,回傳的資料中文變成了Unicode碼,需要另外解碼
return unicodetoString(response1.body().string());
}
//獲取每日一題的題目內容(英文),用來構建完整的請求API
public static String getTitle() throws IOException {
Connection.Response response = Jsoup.connect(questionUrl)
.method(Connection.Method.GET)
.execute();
String csrftoken = response.cookie("aliyungf_tc");
String __cfduid = response.cookie("__cfduid");
OkHttpClient client = new OkHttpClient.Builder()
.followRedirects(false)
.followSslRedirects(false)
.build();
// 獲取LeetCode題目標題時的查詢字串
String postBody = "query questionOfToday { todayRecord { question { questionFrontendId questionTitleSlug __typename } lastSubmission { id __typename } date userStatus __typename }}";
assert csrftoken != null;
Request request = new Request.Builder()
.addHeader("Content-Type","application/graphql")
.addHeader("Referer",questionUrl)
.addHeader("Cookie","__cfduid=" + __cfduid + ";" + "csrftoken=" + csrftoken)
.addHeader("x-csrftoken",csrftoken)
.url(GRAPHQL_URL)
.post(RequestBody.create(MediaType.parse("application/graphql; charset=utf-8"),postBody))
.build();
Response response1 = client.newCall(request).execute();
String titleInfo = unicodetoString(response1.body().string());
//將title決議出來
JSONObject jsonObject = JSONObject.parseObject(titleInfo);
return jsonObject.getJSONObject("data")
.getJSONArray("todayRecord")
.getJSONObject(0)
.getJSONObject("question")
.getString("questionTitleSlug");
}
//解碼
public static String unicodetoString(String unicode) {
if (unicode == null || "".equals(unicode)) {
return null;
}
StringBuilder sb = new StringBuilder();
int i = -1;
int pos = 0;
while ((i = unicode.indexOf("\\u", pos)) != -1) {
sb.append(unicode.substring(pos, i));
if (i + 5 < unicode.length()) {
pos = i + 6;
sb.append((char) Integer.parseInt(unicode.substring(i + 2, i + 6), 16));
}
}
sb.append(unicode.substring(pos));
return sb.toString();
}
}
引入的maven庫:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>LeetcodeSpider</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.9.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.12</version>
</dependency>
<!-- https://mvnrepository.com/artifact/top.jfunc.common/converter -->
<dependency>
<groupId>top.jfunc.common</groupId>
<artifactId>converter</artifactId>
<version>1.8.0</version>
</dependency>
</dependencies>
</project>
思路二(利用python爬蟲爬取GraphQL介面)
參考文章:https://blog.csdn.net/malloc_can/article/details/113004579
# coding=<encoding name> : # coding=utf-8
from datetime import datetime
import requests
import json
import smtplib
from email.mime.text import MIMEText
base_url = 'https://leetcode-cn.com'
# 獲取今日每日一題的題名(英文)
response = requests.post(base_url + "/graphql", json={
"operationName": "questionOfToday",
"variables": {},
"query": "query questionOfToday { todayRecord { question { questionFrontendId questionTitleSlug __typename } lastSubmission { id __typename } date userStatus __typename }}"
})
leetcodeTitle = json.loads(response.text).get('data').get('todayRecord')[0].get("question").get('questionTitleSlug')
# 獲取今日每日一題的所有資訊
url = base_url + "/problems/" + leetcodeTitle
response = requests.post(base_url + "/graphql",
json={"operationName": "questionData", "variables": {"titleSlug": leetcodeTitle},
"query": "query questionData($titleSlug: String!) { question(titleSlug: $titleSlug) { questionId questionFrontendId boundTopicId title titleSlug content translatedTitle translatedContent isPaidOnly difficulty likes dislikes isLiked similarQuestions contributors { username profileUrl avatarUrl __typename } langToValidPlayground topicTags { name slug translatedName __typename } companyTagStats codeSnippets { lang langSlug code __typename } stats hints solution { id canSeeDetail __typename } status sampleTestCase metaData judgerAvailable judgeType mysqlSchemas enableRunCode envInfo book { id bookName pressName source shortDescription fullDescription bookImgUrl pressImgUrl productUrl __typename } isSubscribed isDailyQuestion dailyRecordStatus editorType ugcQuestionId style __typename }}"})
# 轉化成json格式
jsonText = json.loads(response.text).get('data').get("question")
# 題目題號
no = jsonText.get('questionFrontendId')
# 題名(中文)
leetcodeTitle = jsonText.get('translatedTitle')
# 題目難度級別
level = jsonText.get('difficulty')
# 題目內容
context = jsonText.get('translatedContent')
# print(leetcodeTitle)
# print(context)
# print(level)
# print(no)
# 早安語錄介面(天行資料API,自行申請免費))
response = requests.get("")
json = json.loads(response.text)
# 得到語錄資料
ana = json.get('newslist')[0].get('content')
# 表情鏈接
face_url = 'http://wx3.sinaimg.cn/large/007hyfXLly1g0uj7x5jpaj301o02a0sw.jpg'
# 開始運行時間(可通過組態檔解耦)
begin_time = datetime(2020, 12, 23)
# 腳本運行時間計算
info = "<span style='color:cornflowerblue'>本腳本已運行{0}天<span>".format(
(datetime.today() - begin_time).days.__str__())
# 資料全部HTML化
htmlText = """ <head>
<meta charset=UTF-8>
<link rel="stylesheet">
<style>
code {
color: blue;
font-size: larger;
}
</style>
</link>
</head>
<body>
<div> </B><BR></B><FONT
style="FONT-SIZE: 12pt; FILTER: shadow(color=#af2dco); WIDTH: 100%; COLOR: #730404; LINE-HEIGHT: 100%; FONT-FAMILY: 華文行楷"
size=6><span style="COLOR: cornflowerblue">早安語錄:</span>""" + ana + """</FONT><img width="40px" src=""" + face_url + """">
<div>
<h3>Leetcode-每日一題</h3>
<h4>""" + no + '.' + leetcodeTitle + '.' + level + """</h4>""" + context + '本題連接:<a href=' + url + ">" + url + "</a></div>" + info
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/348364.html
標籤:java
