本文目錄
- 【閱讀筆記】 Blockchain management and machine learning adaptation for IoT environment in 5G and beyond networks: A systematic review
- 負一、問答
- 〇、本文的背景
- 一、本文有哪些貢獻
- 二、如何寫一篇綜述?(本文是怎么寫的)
- 三、其他的相關綜述文章
- 四、先行知識基礎
- 4.1 Blockchain
- 4.2 Machine Learning
- 五、BC + ML + IoT
- 5.1 Blockchain for machine learning
- 5.1.1 去信任(trustless)的機器學習合約
- 5.1.2 ML計算中的分布式信任
- 5.1.3 用與Ml models上的可驗證的開放倉庫(Verifiable open repository of ML models)
- 5.1.4 隱私保護(Privacy preservation)
- 5.1.5 ML資料上的加密安全(Cryptographic security on ML data)
- 5.2 ML for blockchain
- 5.2.1 Resource management and computational offloading
- 5.2.2 ML被用來預測電子貨幣的價格(Predicting cryptocurrency price)
- 5.2.3 將ML用于區塊鏈上的例外檢測/攻擊預防
- 5.2.4 降低網路的匿名性
- 5.2.5 區塊鏈相關資料的分類
- 六、ML+BC+IoT的挑戰
- 七、本文作者的總結
- 八、我的總結
- 參考
- 文章資訊
- 封面資訊
【閱讀筆記】 Blockchain management and machine learning adaptation for IoT environment in 5G and beyond networks: A systematic review
本文是一篇CCF C類文章,作者來自印度旁遮普邦帕蒂拉塔帕工程技術學院計算機科學與工程系
負一、問答
- 5G 和 B5G有什么區別?
答:5G主要解決了我們熟悉的高清視頻、傳輸速率等問題;而B5G(Beyond-5G)將解決一些應用場景與技術的完善程序,比如,在遠程醫療、智慧交通、工業4.0方面的行業運用,
〇、本文的背景
大資料分析技術 + IoT應用需要安全和隱私保護 = 造就了機器學習與區塊鏈技術的結合(the integration of machine learning and blockchain)
Keeping in view of the constraints and challenges with respect to big data analytics along with security and privacy preservation for 5G and B5G applications, the integration of machine learning and blockchain, two of the most promising technologies of the modern era is inevitable.
IoT設備介紹
-
IoT設備是什么呢?
Over the last decade, Internet of Things (IoT) has revolutionized the whole world leading to various technological trends starting from Industry 1.0 to Industry 5.0, AR/VR/MR, smart factories, tactile Inter- net, smart transportation, smart plants, etc. It is an interconnection of various devices monitored and controlled using the Internet in order to provide ubiquitous computing services to the end-users. -
IoT設備中存在的問題
Because of the constraints such as — heterogeneity of devices, resource constraints, power storage, security, and data management constant revolutions are foreseen in IoT over the years. Among these, the security and privacy are most crucial keeping in view of the data access restric- tions at various levels in different applications [1]. -
大量的IoT設備的產生
Moreover, with an increase in the number of IoT devices, the data generated by these devices is increasing exponentially in recent years. As per the report [2], the number of IoT devices connected to the Internet at the end of Nov. 2019 was 26.6 billion and is expected to reach 75 billion by the year 2025. -
IoT設備中存在隱私問題
Moreover, all IoT applications are having sen- sitive information, for which security and privacy preservation are of utmost important. Also, devices are reluctant to transfer their data for training purposes in an open environment such as the Internet because of privacy concerns [3].
-
IoT系統中為什么要使用機器學習呢?
-
Also, IoT system needs to be autonomous(自動運行) so that it can learn from the gathered data and make context-based decisions [4]. In such an environment, machine learning (ML) can be an effective tool in understanding the patterns, analyzing, processing, and making intelligent decisions.
-
The ever-growing market for IoT demands the usage of ML-based models for accuracy and precision in the decision-making process. Implementing ML in IoT applications can significantly improve data analytics and real-time decision-making. Applications of ML in various IoT use-cases (e.g., smart transportation, smart grid, etc.) include network optimization, resource allocation, congestion avoidance [6].
-
機器學習市場的發展變化
Fig. 1? shows the global ML market share from the year 2017 to 2024 [5]. Technology advancements in ML and deep learning (DL) have changed the way a computer can process information automatically.
ai在IoT領域的應用(閱讀這一部分的時候,可以看出來,本文的作者文獻參考的情況太少了,一些需要參考其他案例的內容并沒有參考)
-
For example, autonomous controllers based upon Artificial Intelligence (AI) can be used to optimize energy us- age [7].
-
predictive models for energy consumption including Markov’s decision process and NN’s can be incorporated with IoT- enabled devices [8].
-
補充說明一下,為啥我覺得本文作者寫的文章參考量太少;下面這么多篇幅的內容,卻沒有參考相關文獻,
support vector machine (SVM) provides effective data classification for blockchain peers and other transactional entities. Moreover, the supervised ML algorithms such as — random forest, gradient boost, etc. are used to reduce anonymity in the blockchain network. Recently, NN’s are also exploited to predict the price of cryptocurrency. With various computing models, ML can ease data verification, validation process and helps in identification of anomalies and malicious attacks in the blockchain network. Resource management, classification of transactional entities, and managing offloading tasks are some other applications of ML for blockchain.
引出區塊鏈技術
With the centralized authority, threats of privacy preservation, false authentication, data tampering prevails. Also, the reliability of data is very important for ML algorithms in order to obtain accurate results. Even a small security loophole in the ML algorithm can generate high false rate for certain events. Moreover, the computations ofML models are dependent on the trusted third party (TTP) (e.g., a cloud service provider) for many security applications which may raise serious privacy concerns. Hence, there is a demand for decentralized framework based ML.
區塊鏈公司發展、區塊鏈與IoT結合的市場變化
Fig. 1(b) represents the percentage of startups in different industries focusing on blockchain in the year 2021 [9]. As per a report in [10], IoT blockchain 50 based spending is expected to reach $573M by 2023 as compared to $174M in the year 2018 (Refer Fig. 1(a)).
區塊鏈可以用在IoT中的案例
Also, blockchain technology can provide many benefits to 5G IoT networks including secure authen- tication, secure communication, secure network coding, and resource configuration framework [11,12].
區塊鏈對于機器學習的作用?
Moreover, blockchain can improve the performance of ML algorithms as it provides digitally signed data from reliable, trusted, and secure sources. The distributed computing powers can be utilized for developing a better and secure prediction model.
the adoption of ML in blockchain helps to analyze the existing issues in blockchain technology, enabling to enhance the security and privacy of the whole network.
上述圖片是本文的總結性貢獻,我覺得,1)如果我來繪制這張圖片,我會在這張圖片的基礎上再添加上參考文獻;2)ml 和 blockchain應該不屬于上下層級的關系吧,應該分開去繪制,
以后我繪制這樣的圖片的時候,也可以去多找一些這樣的基站的資訊、圖片,我覺得蠻高大上的,
區塊鏈對于5G、B5G的作用?
With blockchain, 5G and B5G services can be more scalable as they support efficient solutions for spectrum sharing and resource management [14].
一、本文有哪些貢獻
本文對IoT環境下的區塊鏈和聯邦學習結合進行了綜合的分類
Then, we presented a comprehensive taxonomy for integration of blockchain and machine learning in an IoT environment.
本文探索了聯邦學習、強化學習、深度學習演算法在區塊鏈上的應用
We also explored federated learning, reinforcement learning, deep learning algorithms usage in blockchain based applications.
最后,對這些技術在5G and B5G下的應用
Finally, we provide recommendations for future use cases of these emerging technologies in 5G and B5G technologies.
二、如何寫一篇綜述?(本文是怎么寫的)
- 寫作的方法
- 如何整理每一篇文章
- 本文的組織結構
文章1.1部分 展示的是調研方法;2部分討論了有關ml和bc的其他調研;3部分討論了ml和bc;4部分討論了ml+bc,并將其分類為ML for blockchain and blockchain for ML;5部分給出了挑戰;6部分給出了結論,
三、其他的相關綜述文章
大部分的ml和bc是不相關的,
Existing literature work reveals that blockchain and ML are surveyed mostly in isolation or with their applications in several vertical domains.
其他相關的綜述
-
ML
Specifically, the survey of ML models for big data analysis can be found in [16–18].
-
BC
Meanwhile, multiple notable works such as [19–21] provide the concepts, advantages, challenges, and future research directions of blockchain technology.
-
BC + IoT
The more recent survey articles in the context of blockchain applications for IoT have been presented in [22–25]
-
ML + IoT
whereas authors of [26–28] discuss the applications of ML models in various fields of IoT
-
BC + ML + IoT
-
Several studies were put forward addressing the integration of Artificial Intelligence (AI) and blockchain.
- For example, the authors of [29] presented a review article on the integration of AI and blockchain by discussing applications of blockchain for AI as well as AI for blockchain.
- Likewise, Salah et al. [30] present the review on the literature and sum- marize the existing blockchain applications and protocols facilitating AI domain. Along with this, open research challenges of implementing blockchain for AI are also discussed by the authors.
-
However, only a few research efforts have been made on the integration of ML and blockchain, in order to provide decision-making service in an intelligent way while assuring security and privacy.
- For example, Vyas et al. [31] discussed the role of blockchain in improving the accuracy of ML results for healthcare applications. However, authors presented a short survey article and in-depth knowledge cannot be gained with this article.
- In the same way, Acheampong [32] presented an overview of the basic concepts of blockchain and ML by discussing the impact of blockchain in ML community.
- More recently, authors in [33] conducted an inten- sive survey that focuses on a specific application of ML for blockchain, i.e., anomaly detection. Also, this article reviews the application of blockchain for privacy preservation in learning process.
-
將ML應用到BC中
- In contrast, authors of [15] presented a review to discuss the applications of ML in blockchain technology. Specifically, authors have reviewed ML for blockchain applications such as — transaction entity classification, Bitcoin price prediction, computing power allocations, cryptocurrency price prediction, and portfolio management
- In another work, Nguyen et al. [34] presented a small section that discusses the efficiency of ML in improving blockchain cloud of things (BCOT) framework.
- Very recently, Rane et al. [35] presented in-depth survey on available ML algorithms for predicting Bitcoin prices and concluded that existing schemes only achieve accuracy of 60%–70%.
- Recently, Liu et al. [36] present a survey article that discusses overview, benefits, applications, open issues, and challenges while combining blockchain and ML. (這篇文章,按照本文作者的描述,應該資料ML和BC結合的文章,但是為啥在本段中進行展示呢,就離譜)
-
作者將上述找到的其他文獻進行了下述表格的總結,這樣的總結我覺得蠻好的,
四、先行知識基礎
4.1 Blockchain
區塊鏈的分類
- private
- public
- consortium
區塊鏈中智能合約的作用及發展
The applications of smart contract are not only limited to cryptocurrency but can be extended to many applications including voting systems, inventory management, automation of payments, automation of claims and blind auctions, etc.
- Solidity: Solidity [42] is the most popular high-level programming language used for implementing smart contracts on the Ethereum platform. This language is influenced by C++, python, and javascript.
- Serpent [43] is inspired from the Python language which focuses on delivering high productivity and automating tasks
- After Solidity, Vyper [44] is the next most popular lan- guage for Ethereum virtual machine (EVM) having syntax in- spired from Python.
- LLL (Lisp like language) is the first low-level language devel- oped after the assembler for EVM and it is a tiny wrapper over coding around the assembler itself. LLL provides direct access to memory in an execution environment and can be easily opti- mized for speed.
為什么IoT一定要使用去中心化?
Moreover, with an over-increasing deployment of IoT objects, security is of prime concern. Cloud computing has been widely used to support IoT for management, processing, and storage.
-
However, its centralized nature raises security questions. Centralized servers manag- ing sensitive IoT data can be shared with anybody without the user’s consent, thus leading to privacy breaches [45].
-
Also, the intermediaries decrease the efficiency of interactions among system components. Also, with an increase in the number of IoT devices, current centralized devices providing security services including authentication and autho- rization will turn into a bottleneck.
-
Moreover, the security vulnerability because of centralization is an easy target for Distributed denial-of- service (DDoS) attacks.
-
Additionally, to ensure data integrity presence of publically verifiable audits without involving a TTP is desirable. In this context, blockchain can mitigate security and privacy risks with its capabilities such as — transparency, immutability, anonymity, decentralization, and operational resilience [4].
如何解決IoT場景下的計算資源、存盤問題?
- However, to support resource-constrained nature of IoT devices blockchain provides the concept of Simplified payment verification, in which nodes need not to store complete blockchain data rather only block headers. In this context, Le and Mutka [46] proposed a lightweight method to validate blockchain data using bloom filter (probabilistic data structure).
- Similarly, authors in [47] presented a proposal that integrates blockchain with constrained IoT devices. The evaluation of the proposal is carried out in terms of memory, processing time, and power consumption.
區塊鏈在IoT場景下應用,需要解決的問題!!(本文作者的總結)
However, high computation, storage costs, high energy demands, communication hurdles, mobility of devices, and latency are some of the challenges faced while integrating blockchain with IoT. In an IoT network, devices generate gigabytes of data in real-time. Due to lack of storage blockchain might appear unsuitable for IoT networks. The limited resource IoT devices are also unsuitable for highly com- putational PoW consensus algorithm. Hence, the scalability issue of integrating blockchain and IoT needs an immediate effective solution. Also, different characteristics of IoT network such as — heterogeneity, wireless communication and mobility complicates the security chal- lenge. Moreover, the transparency supported by IoT can affect the privacy of data. Last but not the least, lack of regulations and standards can influence the future of blockchain and IoT.
4.2 Machine Learning
Machine Learning 介紹,可以瞅一瞅,反正大致就是那一套,
ML is a branch of AI that makes programming machines to perform particular tasks by learning. With time, ML models have been able to exceed humans in various problems. Particularly, previous experience is used to execute assigned tasks. ML algorithms have proved their sig- nificance in various areas such as — transportation, image processing, marketing, etc. ML includes various models to solve different types of problems. The most commonly used ML models involve SVM, Artificial Neural Networks (ANN), decision trees, etc. to name a few. Building a new ML model involves two steps, i.e., training and testing in order to perform tasks of prediction, classification, clustering, etc. on new dataset. Indeed, data is an important source in ML. The data is required in preprocessing and training any ML model. First, the ML model is trained with a training dataset. With the increase in size of training data, the efficiency of ML classifier also increases [48]. Next, after the training phase, the accuracy of the prediction is evaluated with a new dataset. In case of acceptable accuracy, the ML model is deployed otherwise it is trained again. In recent, a popular subcategory of ML named deep learning (DL) has emerged to imitate the human thinking process. The fundamentals of DL have been originated from cognitive theories that are used to create NN structure. Popular applications of DL include object detection, face recognization, and traffic flow prediction to name a few [49]. Supervised learning, unsupervised learning, and reinforcement learning (RL) are three categorizations of learning styles in ML al- gorithms. In supervised learning, the machine is trained with well labeled data, i.e., the data is already mapped with the correct an- swer. Next, the machine is fed with a completely new set of data to generate correct results from analyzing the labeled data from training phase. Furthermore, supervised learning is divided into two categories that include classification and regression. SVM, decision trees, nearest neighbor, etc. are popular algorithms under this category. In contrast, unsupervised learning is training the machine with input data that is not labelled or classified. Specifically, the aim is to group unsorted data as per similarity and difference such as — pattern detection and descriptive modeling. Clustering and association are two categories of unsupervised learning [50]. K-means clustering and Principle Compo- nent Analysis (PCA) are popular algorithms under this category. In RL, an agent is employed to interact with the environment in order to find best outcome by continuously learning from the environment. RL uses trial-and-error method to train itself when exposed to a certain environment. Markov’s decision process is a popular example of RL. Notably, there are vulnerabilities in ML models system with respect to privacy and security.
本文作者認為ML中存在的安全攻擊
Security attack in ML mainly includes evasion and poisoning attacks. Evasion attacks disrupt the entire classification process using adversarial examples whereas the poisoning attack destroys the data while training phase, which can decrease model accuracy [51]. On other hand, the privacy attack on ML model comes from service providers and third-party entities. Clearly, the development of ML mod- els empowers to launch new AI services including facial recognition and words suggestion. Nevertheless, the dataset provided to support these applications often includes sensitive and private information
五、BC + ML + IoT
本節按照下面的思維導圖的結構來撰寫
5.1 Blockchain for machine learning
Blockchain for ML can solve the problem of data acquisition
- With blockchain, the ML algorithm can be fed with highly reliable data and thus accurate and trusted results can be achieved. Also, training ML models with real-data will enhance the accuracy and efficiency of ML algorithms. The built-in consensus mechanism and fundamentals of blockchain ensure secure and tamper-proof sharing of IoT data.
- Moreover, the existing client-master type ML models rely on trusted central servers and consider only privacy issues in linear sharing and ignore privacy in non-linear learning models. In the client-master model, an enormous amount of data generated by IoT devices is collected and stored at one central location whereas, in the distributed multi-party model, data is generated by various parties and stored in a distributed manner. However, the decentralized model incurs high communication costs and raises security and privacy issues. The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data. As discussed, more amount of data available for training improves the overall throughput and produces a more effective and reliable system. Clearly, blockchain in ML can result in much safer data and better ML models.
- 為什么作者認為 「only privacy issues in linear sharing and ignore privacy in non-linear learning models」呢?這個問題我沒法自己解答, 線性模型與非線性模型的區別?https://zhuanlan.zhihu.com/p/37866896
- The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data.
Emmm, for me, 我知道區塊鏈的透明指的是在區塊鏈上的操作是公開透明的,這就避免了資料被篡改;但是,我不明白,為啥能確保ML用戶機密性和資料隱私,
5.1.1 去信任(trustless)的機器學習合約
使用區塊鏈的智能合約來構建機器學習的激勵機制,即充分利用區塊鏈的
去信任化,
The proposal introduced by [56] implements the concept of trustless ML contract and it is defined in 3 phases. In the first phase, a dataset, an evaluation function, amount of reward, and request for best ML model is submitted by the reward giver/buyer. In the second phase, the provided dataset is downloaded by ML model providers/practitioners and each provider works independently in order to train the ML model. After training, the providers submit their model. In the last phase, the winner is selected. Moreover, such a proposal can be utilized for raising funds transpar- ently for IoT applications such as — medical research. In addition, it can achieve automated self-improvement for AI agents. Unfortunately, this proposal [56] does not require identity and reputation validation for creating a new transaction and hence raises security concerns. Also, this proposal works only for Ethereum blockchain. Fig. 8 represents an illustration of trustless ML contracts.
5.1.2 ML計算中的分布式信任
本節主要強調的是,使用區塊鏈可以解決傳統分布式機器學習中的中心化問題,強調的是使用區塊鏈的
去中心化,
Another matter to be considered in the context of ML is that these algorithms lack trustability and automation.
-
Notably, it is difficult to trust results from trained ML models having open source code and open data in an IoT environment.
-
In fact, multi agent socio-technical systems (which work collaboratively on some tasks, share models and data for local computations) due to the involve- ment of independent agents face trust issues in computations from other agents. In
中心化的系統存在資料篡改威脅
As ML algorithm relies on data that is mutable, so it is difficult to trust the results from these algorithms. The system administrator can manipulate the data source that in return changes the result.
目前的ML模型大都是人工的,缺少自動化,那怎么建立一個信任的、透明的協作計算平臺呢?用密碼學技術!
Also, existing ML models are mostly controlled by human beings so it is difficult to automate the ML algorithms. Hence, there is a need for developing an environment having trust and transparency in computations for collaborative op- erations. To solve this problem, zero-knowledge proof, Elliptic-curve cryptography (ECC), etc. are some cryptographic techniques that are effective in the verification and validation of computations [73,74].
-
In this context, Raman et al. [57] proposed a model for verification and validation of computations in a permissioned blockchain network for multi-agent socio-technical system. Authors have demonstrated the usage of blockchain in developing trust for recording and validating audit at each step of computations.
-
However, due to lack of scalability large scale computations for a multi agent network prove expensive.
For this, the authors have used a lossy compression technique that reduces the communication and storage cost of the blockchain network.(這篇文章就是模型壓縮的相關文章,回頭可以去看一下)
-
-
Similarly, authors of [62] established a link between ML and blockchain technology in order to solve trustability and automation issues of ML by using association rule mining.
5.1.3 用與Ml models上的可驗證的開放倉庫(Verifiable open repository of ML models)
用ML來作為區塊鏈挖礦的程序(即挖礦節點上的“可驗證”)
比如,使用訓練的程序來替代區塊鏈的共識演算法,但是,我怎么感覺這里不屬于 區塊鏈為ML做的事情呢,emmm;有點像ML為區塊鏈做的事情,emmm;本文是不是指的是使用區塊鏈來構造這么一條MLmodel鏈呢? 這一段是不是強調使用區塊鏈來構建一個ML框架的事情呢?
這一章,同時介紹了,使用區塊鏈(智能合約)來為ML做一些作業的時候需要了一些困難,以及相關的解決文獻,
Pow共識演算法的缺陷?
Among all research work on consensus algorithms, Proof-of-Work(PoW) is the widely accepted technical consensus algorithm use to settle among all participating nodes. However, the PoW consensus algorithm proves costly and environmental unfriendly due to the high computations involved in it. After PoW many other consensus algorithms such as — Proof-of-Stake (PoS), Proof-of-Activity (PoA) were introduced in order to reduce computations while mining blocks.
- In this context, the authors of [58] introduced a cryptocurrency named ‘‘WekaCoin’’ that is based on Proof-of-Learning (PoL) consensus algo- rithm. PoL is inspired by open-source ML competitions (e.g. Kaggle and CodaLab). Among all network nodes, some nodes called trainers upload ML models on blockchain network for tasks that were submitted by other nodes called suppliers. (The model initiator may upload their model on a Interplanetary file system (IPFS) system and in return receives checksum hash.) The uploaded models are then tested for data that was not considered by trainers while training. The validator nodes which are selected randomly are then supposed to rank these models and add the information to the block. The trainer nodes having the best model are rewarded with WekaCoins by supplier nodes. This way blockchain can be used for generating verifiable ML models. The flowchart for the understanding of PoL algorithm is presented in Fig. 9. The main advantage of this protocol is that the computations involved in the validation process solve useful tasks as well as creates a validated open repository for ML models and datasets. However, the authors have not discussed the prevention of collusion among suppliers, trainers, and validators.
- In contrast to the permissionless blockchain, authors of [69] developed privacy preserving distributed ML model based on permissioned blockchain network. This is, however, a first attempt to propose a distributed ML model for a permissioned blockchain network. Decentralized ML allows machines to perform intelligent decision-making on data securely stored on the blockchain network without involving any TTP. The decentralized ML technique allows algorithms or ML models to run directly on connected mobile devices. This distributed technology is smart contract based marketplace that connects developers, clients, and data owners by facilitating all stakeholders in a way to create a middle-man free ML infrastructure. The authors demonstrated that the impact of proposed error based aggregation rule supports high resilience and mitigates collusion attack.
- However, latency and bandwidth are the major drawback of distributed ML [75]. To improve network condition, 5G technology can be adopted as it enables high availability. In this direction, to ensure byzantine resilience for distributive learning in five networks, authors in [70] have proposed a blockchain based secure computing framework. By using a sharding based blockchain, authors have prevented arbitrary attacks on learning convergence.
智能合約存在的問題?
-
智能合約不能執行太重的任務
However, authors of [76] pointed out that ML programs cannot be stored with blockchain because of the certain limitations of smart contracts. The authors pointed out that smart contracts cannot process high computational tasks.
這一段內容的思想表明,計算所帶來的損失消耗會影響挖礦的程度(這與我的綜述文章的思想是一致的,我覺得可以參考一下)
With the blockchain mining process, when output corresponding to any input is expected to be recorded via smart contracts, honest miners then execute the program to verify the correctness of results. In case of a computationally high process, adversarial nodes can skip and carry forward to verify the new block. This way adversarial nodes can get a chance of adding new blocks as honest participants are busy with the execution of smart contracts.
-
另外,智能合約不能執行亂數,
Moreover, the smart contract cannot carry randomized computations as with randomization honest nodes can have inconsistent output. Besides, as ML computations are costly and randomized, so ML tasks are difficult to execute with blockchain. To address this challenge, the authors of [76] have used a game theory approach that empowers randomized computations on the top of blockchain. Here, a simple incentive mechanism is designed in order to execute the program with crowdsourcing in a blockchain environment.
5.1.4 隱私保護(Privacy preservation)
使用區塊鏈來解決ML中遇到的隱私問題,這里強調的是使用區塊鏈的“密碼學技術、不可篡改”等內容吧?不能確定
比如: 為了保護上傳時的隱私、使用區塊鏈來保護聯邦學習的安全性(但是這一條,我覺得是ML為區塊鏈做的作業吧,隱私保護,emmm,區塊鏈也能夠保護聯邦學習,但是這里體現的是隱私保護嗎?)
為什么ML遇到了隱私保護的問題?
Another matter to be considered in the context of ML is the privacy preservation of data. For example, ML healthcare predictive modeling has proved beneficial in national healthcare research and biomedical discoveries. However, data disclosure of patients to these third-party cloud services leads to privacy attacks. The available distributed privacy preserving predictive models are dependent on the central server to execute the modeling process [77].
下面的這句話,應該不足以支撐本段觀點吧,emmm
Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.
作者的結論為:The state-of-art research has adopted blockchain technology in order to deal with the above-mentioned risks. The characteristics of blockchain technology make it suitable to deal with centralized privacy preservation models.
但是我個人覺得,上述內容并不能證明區塊鏈能保護資料的隱私吧,emmm
ML中存在單點故障、等問題?
Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.
答:Blockchain avoids a single point of failure, Byzantine General, and Sybil attack problem and preserves privacy while predicting the modeling process.
本文給出的區塊鏈能保護ML隱私的案例
-
In this context, Kou et al. [59] have presented Modelchain, a private blockchain that enabled privacy preserving pre- dictive modeling for the healthcare industry. Instead of relying on only PoW protocol, the authors have designed a new algorithm on the top of PoW named proof of information to increase the efficiency and accuracy of ML model. Unfortunately, the proof of information algorithm proves inefficient to deal with the scalability of the network. The result section demonstrated that Modelchain provides a secure and privacy preserving interoperability framework. Unfortunately, privacy preservation is provided but the authors of [59] did not consider the basic requirements for differential privacy as differential privacy based ML has to consider the fact that how many times a ML model can be trained without any privacy breach.
沒有太看懂本文作者解釋的上述文章的問題!!
-
Subsequently, Chen et al. [65] proposed another decentralized ML system called ‘‘Learningchain’’ that takes both linear and non-linear learning models in account without relying on the central server. Here, differential privacy based methods are also designed to preserve the privacy of data. Differential privacy or cryptographic solutions have proved to be efficient for preserving user’s data privacy [78–80]. This model is implemented on the Ethereum platform and a stochastic gradient descent algorithm is used to design a predictive model over blockchain.
The proposal works in 3 phases. In the first phase, a P2P network is initialized. In the second phase, data holders calculate their local gradients as per predefined common loss function and predictive model using differential privacy methods. Next, computed gradients are broadcasted in the network using differential privacy scheme for learning models. After reaching a consensus, local gradients are aggregated by the authority holder using Learningchain. Three different datasets were used for training and testing purposes, i.e., synthetic dataset, Wisconsin breast cancer dataset, and Modified Na- tional Institute of Standards and Technology database (MNIST) dataset. It is concluded in results that there is a trade-off between privacy and accuracy as lowering the privacy budget increases test errors.
為了保護上傳資料時的隱私問題,
With the growing trend of DL models, many DL models are designed to be run on client devices such as — IoT devices or smart devices. Although this technique demands enough memory and disk space to run the models in real-time. Also, because of privacy concerns, it is not recommended to upload client data on a centralized machine for processing and executing ML algorithms.
Along the same line of thought, to preserve privacy while uploading client ML data, authors of [61] proposed another work. Singla et al. [61] proposed a blockchain-based system that stores client device profiles in a shared household to predict user activity. Here, the main aim is to enable automatic customization of each client using blockchain decentralized security and privacy. The personalization feature of each device is computed using rule mining. However, this proposal is based on the assumption that client preferences are not changing.
解決協作資料分享問題
Similarly, to solve the challenge in collaborative data sharing among multiple parties in IoT applications, Lu et al. [66] proposed a privacy preserving data sharing model using differential privacy methods.
引出聯邦學習
However, rather than sharing raw data directly, the federated learning algorithm is utilized into permissioned blockchain network through which only data model is shared over decentralized multiple parties. In a centralized ML model, participants upload their data on central cloud server. The server performs all computational tasks for training on the data as shown in Fig. 10(a).
吹一波聯邦學習
This model involves high risks of privacy attacks. Also, communication overhead is created between participants and the cloud server. In contrast, federated learning enables ML models to be computed on distributed mobile devices. This technique helps ML models to be trained on the devices where data is produced. This way the privacy of data is ensured as data of a particular device does not leave its data production place. This technique is disrupting the centralized way of data training.
聯邦學習的程序
In federated learning, each device has its local training dataset that is never seen by the server and each device generates an update to the existing global model located at the server. Next, the server combines these models by aggregating them and the whole process is repeated until global model training is completed. The primary benefit of federated learning is the decoupling of the training phase from the requirement of direct access to raw training data. The process of federated learning based model is represented in Fig. 10(b).
為什么聯邦學習要和區塊鏈結合在一起
Therefore, it minimizes training and privacy risk. However, the usage of a single central server is vulnerable to a single point of failure. Moreover, there is no reward service for distributed devices. Notably, the devices with more data samples should be given reward as it con- tributes more to global training. With blockchain, verified local updates and exchanges can be enabled along with providing corresponding rewards proportional to the size of training sample size. The illustration of blockchain based federated learning has been represented in Fig. 11.
聯邦學習和區塊鏈結合時遇到【假裝擁有資料】攻擊及解決案例
Unfortunately, the federated learning technique fails to provide security in case of the presence of Byzantine nodes. If an attacker, pretends to be a real data holder and breaks down the security of system, such an attacker is called Byzantine attacker.
- In another work, Zhu et al. [67] also presented a blockchain based privacy preserving method for securing updates and achieving consensus in federated learning.Here, blockchain technology is adopted to deal with Byzantine devices in the network. In particular, only updates are added in blockchain transaction records. Along with broadcasting digital signatures of a node, other information such as — hyper-parameters, difference in weights, and public ID’s are also broadcasted. The other participants of the network validate the broadcasted transactions as per their local datasets. If majority of the participants approve that the performance score of the updated model is greater than the existing models then updates are added to the model.
- Similarly, Doku et al. [63] also integrated blockchain technology and federated learning to improve the quality of data. Here, the hash of mobile device data is stored on blockchain whereas data still remains on the user’s device, and only the locally analyzed results will be shared with ML practitioners via a secure network. In addition to this, incentives will be provided to data owners.
使用區塊鏈來加強聯邦學習的安全性
- Additionally, in order to enhance the security of federated learning, the authors of [71] proposed a framework based on blockchain in order to verify and exchange local learning models. This scheme aims to activate on-device ML involving any centralized server. A reward mechanism is also proposed for user and miner node participation. Additionally, authors have evaluated end-to-end average learning completion latency.
- In a closely related work, authors of [72] proposed federated learning with multi-access edge computing and blockchain technology. Here, edge devices are employed to provide resources to mobile devices and also to act as blockchain nodes. Here, a separate channel is dedicated for learning of every global model in the blockchain network.
- Unfortunately, in this proposal user devices are dependent on the integrity of corresponding edge nodes for sending transactions to blockchain networks. Additionally, no reward mechanism for user and miner nodes is designed by authors.
保護ML程序的資料安全
-
Also, authors of [81] leveraged suite of ML to support data exchange on the blockchain via smart contract for a distributed data vending architecture. Particularly, data embedding and distance metric learning approaches of ML research are used to enable retrieval of smart con- tracts without affecting the integrity of private data. Here, the signature of data entry is generated using data embedding procedure with privacy preservation, and further signatures are taken to measure similarity among data entries.
-
In an alternative work, the authors of [64] also proposed blockchain based model named ‘‘secureSVM’’ for privacy preserved sharing of data while training ML algorithms. Here, IoT data generator encrypts data on the local device by their private key, and this encrypted data is stored on blockchain. The experimental result proves that incorporating blockchain with SVM classifier improves the accuracy of the system model.
上面這兩個案例沒有看懂!!!
5.1.5 ML資料上的加密安全(Cryptographic security on ML data)
使用區塊鏈來保護ML使用的資料的安全訪問,
但是,我怎么感覺這一章節的內容在上一個章節中已經提過了呢,emmm
-
使用基于區塊鏈的訪問控制管理器實時安全地訪問存盤在不同地方的資料
是不是可以理解為 ML使用資料的安全訪問控制???
Classification of IoT data with black-box concept, questions the type of data being collected. Hence, the system needs to attain con- fidentiality, integrity, anonymity, and secure access on data. Authors of [60] have used blockchain in retraining stacked denoising autoen- coder (SDA) algorithm for arrhythmia classification. Retraining is used to solve non-stationary nature of ECG data because it enables deep net- work in learning any new distribution at specific time intervals whereas SDA has the feature of taking different relevant features from data samples. Here, patient data stored on external storage that is collected by retraining SDA algorithm are securely accessed using blockchain based access control manager in real-time. A scenario of blockchain based secure access control on ML data has been represented in Fig. 12.
-
這個是研究區塊鏈在CNN網路結構中的作用嗎?看不太懂
More recently, Goel et al. [68] experimentally investigated the role of blockchain in providing authenticity to each block of Convolutional Neural Network (CNN) model. In CNN, each convolution layer is referred to as a block and the authors pinpointed the accountability of each block for correct output. To this end, blocks of CNN are kept in random order and neighbor blocks have the information regarding the next legitimate block. Indeed, hiding the architecture of the network from attacker, mitigates the threat of white box adversarial attack. Also, this scheme enhances transparency between blocks and the entire network. Unfortunately, the complexity of the system is quite high.
-
使用區塊鏈來為ML提供匿名性
Another potential application of blockchain for ML is in providing anonymity. As discussed earlier, if the data is stored anonymously, it is hard to link the true identity of the person. Authors of [82] pointed that the facility of pseudo anonymity provided by blockchain can encourage the use of ML on anonymous dataset. Researchers can now use massive datasets for their research in order to improve the prediction results of healthcare system. However, along with anonymity, encrypting data could enhance security of the system. To address this challenge, homo- morphic encryption was introduced that has the ability to execute ML operations on encrypted data [83].
本章內容的總結
Summary and insights Section 4.1 focuses on various applications of blockchain technology targeting ML areas for IoT environment. Incorporating blockchain technology in ML provides reliable sharing of data for different tasks of ML including prediction, forecasting, voice, and speech recognization to name a few. However, we have made several observations after reviewing and tabulating the literature. Clearly, with trustless ML contracts, trustless rewards can be provided to the best ML model.(去信任的獎勵) However, there are some risks in the proposal that need to be deal with. For example, the organizer may deny to reveal the testing dataset which may stop submitters for their work as no evaluation function would be available then. Moreover, based on selection criteria, the reward money can be claimed by the first submitter fulfilling evaluation criteria. Hence, the reward mechanism can be evenly distributed in order to incentivize more participation. Also, it has been observed that nodes are still reluctant to host data on IPFS blockchain data storage. So, future work should consider these problems before designing revised trustless ML contracts.
Additionally, it has been observed that most of the proposals have leveraged public blockchain which makes data generation speed slow. Hence, a fast data stream situation in blockchain is another important topic of research. Moreover, it has been observed that researchers have not considered the confidentiality of ML data in the blockchain network.
我有個問題,如果沒有保護區塊鏈網路中的ML資料保密性,那上面介紹的資料隱私保護方法是什么呢?搞不懂,emm,另外,感覺本文作者總結的并不是很好呀,emmm
5.2 ML for blockchain
For blockchain, ML can solve issues of uncertain and complex features.
-
作者認為:在區塊鏈環境中,IoT傳感器中產生資料,這些資料可以使用ML演算法進行分析,
In a blockchain environment, the data gathered from IoT sensors can be analyzed and monitored at multiple points by ML models for efficient decision-making [98].
-
有文獻提出 「blockchain thinking」The main aim is to utilize the frame- work of blockchain for initiating thinking machines.
Following an emerging trend rendered by the adoption of ML in blockchain, Swan [99] introduced a new term called ‘‘blockchain thinking’’ that enables accommodating thinking on blockchain network. The main aim is to utilize the frame- work of blockchain for initiating thinking machines. In such a type of framework, input involves sensor data. Further, the input data is processed at a specific location to generate output that includes storing information to memory or taking a specific action. This process involves ‘‘personal thinking chains’’ that signify backup of full human mind files.
-
為了實作區塊鏈思考,可以結合IPFS技術(但是,為什么放在第一段呢?)
To implement the blockchain thinking process IPFS could be relevant as it eases P2P file serving system [100]. Notably, the research work of ML is entirely data-driven. This data can be shared via a central resource or a distributed file system. Using central repository will be inefficient with the increase in the number of users. On the other hand, IPFS is a distributed file system to store data files in a decentralized manner. Also, each file in an IPFS is assigned a unique fingerprint called cryptographic hash. IPFS will disseminate data files with a list of trusted nodes and the data will be available to other users using content identifiers.
-
將ML用于區塊鏈的相關研究比較如下6、7、8、9表
深度強化學習和區塊鏈的相關比較,在資源管理和計算卸載領域,基于交易
相關的ML模型比較(價格預測)
基于區塊鏈和聯邦學習的價格預測的比較(基于交易)
5.2.1 Resource management and computational offloading
本章的主要背景:在IoT系統中遇到了一些資源浪費等問題,為了解決這個挑戰,一些靈活的資源管理框架將blockchain和ML結合在一起,
Resource management is the process of scheduling and allocating resources in order to maximize efficiency of the IoT system. Energy consumption, transparency, operational expenditure, request scheduling, latency, content caching, and security are some of the issues involved in the realization of resource management process [101]. To address this challenge, few secure and flexible resource management framework has been developed in literature by integrating blockchain and ML.
引出深度強化學習
A blockchain based platform possesses the capability to store all records of transactions related to resource management in a distributed and transparent data structures. However, to increase the efficiency of the network, ML models can be experimented with blockchain. In particular, deep reinforcement learning (DRL) has been extensively used with blockchain to achieve resource management tasks. DRL technique has the capability to handle dynamic and large dimensional features of IoT. The main concept behind DRL is that similar to a biological agent, an artificial agent may learn from interaction with its surroundings to take further decisions. By interacting with the environment, the agent gathers experience to optimize objectives served in the form of cumu- lative rewards.
-
For example, authors in [86] have used DRL method for maximizing transactional throughput of the blockchain network. In particular, DRL selects block producers, block size, and block interval to adjust the dynamic features of the Internet of Vehicles (IoV) scenario
-
Also, in order to achieve resource management for tasks such as — content caching, computation offloading, spectrum sharing, etc., the authors in [85] have utilized DRL. Specifically, this scheme has utilized DRL for the Device-to-Device (D2D) caching scheme that matches the caching supply and demand pairs to maximizes the network utilities of consortium blockchain enabled framework. Notably, DRL based caching scheme optimizes bandwidth between caching requester and provider. It has been demonstrated in the results that cumulative average system utility has been improved. However, this proposal has not discussed the mining procedure.
-
Meanwhile, when embedded with smart contracts, ML helps to minimize the energy expense in cloud data centers (DC’s) as discussed by the authors of [84]. Here, the smart contract facility of blockchain migrates the requests and virtual machines to the cloud DC’s with minimum load, and RL method based request migration is used for energy cost minimization as this method does not require any prior knowledge. Fig. 14 represents the blockchain and ML empowered resource management scenario for smart grid networks. Here, all com- putation intensive tasks including caching, billing, demand-response management, etc. are implemented at edge layer of the network due to resource constraints. Notably, learning capable ML agents employed on edge devices are responsible for implementing effective caching, computational offloading, scheduling, and real-time decisions on the edge devices. Moreover, mobile base stations used to transfer data to edge devices also have ML models running on them for scheduling computational or storage requests.
ML對區塊鏈的另一個應用是在移動區塊鏈網路中的卸載(指的是:移動設備的計算能力有限,)
Another perspective application of ML for blockchain is in offloading approaches for the mobile blockchain networks. With the introduction of mobile technology, the blockchain network can now be easily used with mobile devices so that more flexible blockchain applications for IoT can be developed. However, with mobile systems, resource-constrained IoT devices face difficulty while mining blocks. In this context, mobile edge computing facilitates high computational tasks for mobile devices. However, there is a challenge of effectively allocating available edge computing resources to miners. Mobile de- vices can offload their high computational tasks to the assigned mobile edge/cloud server. With a motive to enhance the performance of the system, literature contains multiple offloading approaches.
-
For example, convex optimization model, and game theory approaches has been used by authors of [119–123] that minimizes task execution latency. Nevertheless, these methods fail for highly complex online models and also they demands prior knowledge about the system. To solve this issue, RL can be used where a learning agent is employed to derive an optimal solution for computational offloading via trail-and-error method. Moreover, this solution does not require prior system statistics knowledge.
-
However, for high dimensional computational offloading challenges, RL solution also gets fail due to high dimensions of state and action space as pointed by work in [124,125]. To deal with high dimensional data, the use of DRL is beneficial and some literature work has demonstrated the scalability and offloading efficiency of DRL in blockchain based edge computing applications. DRL can achieve an optimal offloading strategy based on past experiences of offloading. Both of the proposals in [87,88] were designed to preserve users’ privacy and to achieve security as an optimization problem. By using DRL method, performance metrics including computational latency, energy consumed, and privacy level were analyzed proving feasibility of the proposed scheme with reduced offloading latency and minimum energy consumption
-
上述的樣例只是避免了在挖礦程序中的計算卸載
The above-discussed offloading approaches are designed only for mining tasks whereas data processing tasks are ignored. In contrast, the work in [89] has discussed computational offloading for both mining and data processing tasks combining DRL and genetic algorithms. Additionally, Markov decision process has been used to handle the dynamic environment.
-
However, to implement DRL method for offloading decisions, the major challenge is to achieve convergence and accuracy of deep NN. Also, there is a need to develop effective resource allocation on mobile blockchain. To address this challenge, authors of [102] designed a multilayer NN supported auction mechanism for re- source allocation in mobile edge computing environment. The auction mechanism assures that edge resources are allocated to those miners who value resources the most. Simulation results demonstrated that the proposal converges quickly to a solution where the profit of the service provider is higher than the proposal provided by the authors of [126].
-
Recently, Asheralieva and Niyato [90] proposed a Bayesian RL and DL based approach to make interactions among miners in blockchain network with mobile edge computing. In particular, a game theory based approach is used by miners to offload its block operation to any of the base stations with mobile edge computing server.
-
In contrast, the authors in [103] have used federated learning to deal user equipment privacy issues as edge node transactions are mostly based on centralized approach. Federated learning builds ML models without centralizing the training data on a central server. Here federated learning facil- itates user equipment’s to train their data locally without exposing the data for optimizing system model. In contrast, blockchain and smart contract facility are used to secure transactions cross silo FL in B5G network.
5.2.2 ML被用來預測電子貨幣的價格(Predicting cryptocurrency price)
位元幣的開放性為價格預測提供了機會
Bitcoin [127] introduced by Satoshi Nakamoto is the first world’s most popular cryptocurrency and is accepted by 111 countries world- wide. As a valuable cryptocurrency, Bitcoin provides an opportunity for price prediction because of its volatility and open nature [128].
位元幣的價格波動引起了研究者們的興趣
The price of Bitcoin was around $7202 in late 2019, compared to about $3468 in January 2019 [129]. Researchers and stakeholders of the financial sector are trying to figure out the reason for changing trends in the cryptography market. Similar to stock market prediction, Bitcoin price prediction can be represented as a model for time series prediction.
由于缺乏季節性和位元幣區塊鏈網路的高波動性,這些傳統的時間序列模型不適合位元幣的價格預測,
However, conventional time series approaches are based on linear assumptions and are effective in the case of seasonal and noisy data [130]. The absence of seasonality and the high volatile nature of Bitcoin blockchain network makes these traditional time series models unsuitable for Bitcoin price prediction. Nevertheless, for time-series prediction of uncertain data, some non-linear methods such as — ANN, Bayesian Neural Network (BNN), and SVM have gathered interest from researchers. Generally, ML based price prediction models have been evaluated on the following evaluation metrics:
Relatively few studies have been conducted on estimating time-series of Bitcoin price using ML model. In this context, to deal with uncertain and non-linear data, DL has been proved to be an effective solution.
- For example, for the first time, the authors of [96] used DL for price prediction of cryptocurrency. Other than Bitcoin, DL tech- niques are applied to predict the price of Ethereum, Ripple, and digital cash cryptocurrency. For result analysis, the Long short-term memory (LSTM) model is compared with the generalized regression neural network model (GRNN). LSTM is a subtype of recurrent neural network (RNN) and it is designed to deal with long-term dependency problems. LSTM follows recurrent topology whereas GRNN has a parallel and memory based system and attains fast learning with a large sample size. However, the prediction results of LSTM are better over GRNN for RMSE. Rather than just presenting a predictive model, the authors have also conducted a chaotic time series analysis.
- Similarly, Mcnally et al. [91] predict Bitcoin blockchain price using both LSTM, and RNN methods reporting price prediction accuracy of LSTM to be better than RNN. Here, both NN models, i.e., RNN and LSTM are experimented with two hidden layers having 20 nodes per layer. The dataset used for train- ing purposes is considered from Aug. 19, 2013 to July 19, 2016. The result section proves that RNN, LSTM, and Autoregressive Integrated Moving Average (ARIMA) all have almost similar accuracy, i.e., 50.25, 52.78, and 50.05 respectively. ARIMA model, however, implements time series data having linear nature. As Bitcoin data is volatile in nature, so ARIMA cannot generate accurate results as compared to RNN and LSTM. Here, the DL models are trained with only considering Bitcoin price index.
- Likewise, the authors of [131] demonstrated the impact of LSTM for Bitcoin price prediction by opting for 10 neurons in the hidden layer.
- In contrast, Jang and Lee [92] conducted a study on predicting Bitcoin price by using a BNN. BNN is based on the Bayesian theory for neural networks. BNN’s have applications in various fields such as — pattern recognization, Natural Language Processing (NLP), image recognization, traffic flow prediction, etc. [132]. Similar to a model of Multilayer Perceptron (MLP), a BNN consists of an input layer, an output layer, and one or multiple hidden layers. While training model, backpropagation method updates the weight of neurons at each layer with current error propagated backward by output layer to the previous layer. In addition to backpropagation method, delta rule is used to minimize the sum of errors. By utilizing the backpropagation method, BNN can handle exclusive OR (XOR). Also, the regularization term of BNN prevents overfitting problems in training data.
先前的文章專注于分析區塊鏈價格,而忽略了區塊鏈變數的非線性關系,
Notably, previous literature work has focused on analyzing Bitcoin prices **without taking into account its non-linear relation with blockchain variables.**Further, the authors of [92] have concluded that an ML model only trained with the Bitcoin price index results in poor predictive performance. (在平時的科研中,如果遇到沒有找到參考文獻的案例,或者是了更好的論斷效果,可以考慮通過展示實驗效果的方式來論證)
-
Differently, Barro’s Bitcoin pricing model [133] has been considered by authors for empirical study. In this proposal, the blockchain variables such as — average block size, transactions per block, median confirmation time, hash rate, difficulty, miners revenue, and the number of confirmed transactions are used for training of model that analyzes Bitcoin price by using BNN’s and results are compared with those obtained using Support Vector Regression (SVR) and linear regression model. It is observed that both training and testing phases show poor performance with SVR model. Notably, rather than training model with only Bitcoin price index, BNN considers non- linear effect of blockchain information and other macro economical factors affecting the price of Bitcoin whereas regression model can only handle linear relationships. Although as an advantage, the feature ex- traction procedure of regression model removes incorrect values which results in better prediction model.
-
Similarly, Madan et al. [93] chose 26 features related to the Bitcoin network along with daily Bitcoin prices. Some of these features include average confirmation time, block size, difficulty, estimated transaction volume, and number of transactions, etc. To predict the Bitcoin price, authors have leveraged SVM, random forest, and binomial generalized linear model (GLM) algorithm and achieved prediction accuracy of around 97% without cross-validating that however limits the generalizability of results. Results demonstrate that the random forest algorithm performs best as it is based on the non-parametric decision tree. However, the precision value for random forest is lower than that of binomial GLM as it also possesses the ca- pability to solve linearization problems for Bitcoin dataset.
-
In addition, Greaves and AU [94] developed another Bitcoin price prediction model by leveraging SVM and ANN and conclude that accuracy with ANN is best, i.e., 55%. Authors have used historical time delta of 1 h, 1 day, 1 weak, and 1 month to develop features for supervised learning. Total Bitcoin passing through, net Bitcoin flow, number of transactions, and closeness centrality are the collected features for predicting price. Simultaneously, they concluded that net Bitcoin flow, and number of transactions are the most informative Bitcoin features.
-
Another effort to analyze features that highly relate with Bitcoin price change is carried out in [95] by using linear regression, random forest and gradient de- scent models. Here, authors have taken features from the dataset such as — number of wallets, unspent transaction output, block size, and some others. The performance result of the proposal has been evaluated using RMSE and MAE.
-
Likewise, Velankar et al. [134] predicted Bitcoin price using Bayesian regression, and random forest method. Block size, total Bitcoins, day high, day low, number of transactions, and trade volume are the set of selected parameters to be fed to the predictive network.
-
On the same line of thought, Mangal et al. [97] experimented with logistic regression, SVM, ARIMA, and RNN and concluded that RNN has the most accuracy among all.
作者解釋了一下,為啥要在本文中討論加密貨幣價格預測,
Notably the existing literature on cryptocurrency price prediction is not designed for the IoT environment. However, applications of IoT network includes payment transactions to be made between nodes. In a blockchain based IoT network, payments are realized with digital cryptocurrency and hence the discussed studies on cryptocurrency price prediction could be applied to IoT networks as well
5.2.3 將ML用于區塊鏈上的例外檢測/攻擊預防
區塊鏈中可能會遇到51攻擊和雙花攻擊等
With the popularity of blockchain, the risk of security issues such as 51% attack (majority attack), double spending attack, etc. also increases as discussed in [135,136].
-
對這兩種攻擊的解釋
Due to propagation delay in blockchain network, the double spending attack might happen when a participant tries to indulge in more than one transaction with the same number of cryptocoins. On the other hand, the majority attack happens when more than 50% of the network’s participants control the network and conspire to take control over the ledger.
ML可以用于區塊鏈的什么內容?
Moreover, the open nature and public design of Bitcoin system allow any user to be a participant. The goal of ML models is to learn insights, outliers, classify, and detect patterns in large data repository, so it can be used for blockchain attack detection.
Moreover, with blockchain technology, ML algorithms can train, learn, and can take decisions on local system in a decentralized network. Hence, processing data locally can prevent security and pri- vacy issues to some extent. Various authors have used ML models for anomaly detection in blockchain networks. Both supervised and unsupervised ML algorithms have been employed to design intrusion detection and prevention system. To detect isolate malicious in the network, various ML models are utilized by literature studies such as — SVM, k-means clustering, etc.
-
For example, Dey [110] has discussed the issue of majority attack in the blockchain network. Specifically, the majority attack is a concern in consortium blockchain (e.g., Hyperledger) as it involves business parties collaboration.
-
To solve the problem faced by majority attack, authors of [110] have proposed an approach based on supervised ML model and algorithmic game theory. Supervised ML algorithms are leveraged to classify whether the attack will take place or not. However, this work is still in progress, and simulation results or any proof have not been demonstrated by the authors.
-
In contrast to the supervised ML approach, another effort for detecting anomalies in Bitcoin network is made by Pham and Lee [112] using 3 unsupervised ML methods that include k-means clustering, Mahalanobis distance based method, and SVM (on two Bitcoin trans- action graphs). The dataset used for training includes 6,336769 users with 37,450461 transactions and 12 features (including in-degree, out- degree, average in-transactions, balance, etc.) are extracted.
-
On the other hand, the same authors in their research in [112] use laws of power degree and local outlier factors on the two graphs produced by Bitcoin network to detect anomalies.
-
In a closely related work, authors of [114] proposed an unsupervised statistical ML approach to detect anomalies on blockchain based sensor data belonging to condition management of the industrial asset.
-
Following a trend rendered by the adoption of unsupervised ML for anomaly detection, authors of [108] used trimmed k-means clustering for cybercrime detection in Bitcoin network. Compared to other approaches on fraud detection, k-means clustering provides better results in terms of detection rate.
-
Similarly, Scicchitano et al. [137] proposed an anomaly detection system using an unsupervised encoder decoder DL model which is trained with aggregated information extracted by analyzing blockchain network activities.
阻止使用位元幣來進行非法交易:人口販賣、買賣毒品
Besides, in order to prevent human trafficking and drug sale involving Bitcoin, Portnoff et al. [138] proposed another ML based classifier that categories ads by the person paid for the ads. The ML classifier uti- lizes stylometry that takes two ads as input and differentiates whether the ads are published by the same or different users. The flowchart for ML based anomaly detection in blockchain network is presented in Fig. 15. Firstly, the IoT data provider collects the data from IoT sensors and sends it for the data preparation phase which involves data preprocessing (transforming the dataset into machine readable format) and feature extraction. Further, the data analysis phase is carried which involves training data with selected ML algorithms. Here, the weights and biases are adjusted in order to get more correct predictions. Finally, the trained model is tested against never seen dataset for anomaly detection.
使用分片的這個案例并不能用來舉例
Differently, in the research [63], authors have leveraged the concept of sharding to solve scalability issues. While implementing the concept of sharding, the blockchain network is divided into interest groups and each group has its own ledger to verify transactions. Dividing the network improves network efficiency by empowering parallelism. Proof of Common Interest consensus algorithm is used to validate data that is directed to the relevant interest group. The proposal mitigates DDoS, MITM, and data leakage attacks.
Notably, an online ML security system that detects abnormal clients in the network appears to be a topic that is understudied. To this end, Bogner [113] proposed an online unsupervised ML method for fraud detection that is optimized for interoperability. Different from other approaches, research of Bonger involves visualization techniques along with an interactive querying system meant for manual expert analysis. The proposal is evaluated using public Ethereum blockchain network.
智能合約中可能存在漏洞
On the other hand, authors in [115] focused on the security of Ethereum smart contracts. As smart contracts are open in nature, any vulnerability present in the contract is visible to anybody on the network. For example, the decentralized autonomous organization (DAO) is a smart contract and due to some vulnerabilities in its code, it was hacked losing $150 million [139].
-
Here, in [115], authors have utilized CNN model for automatic feature extraction along with learning and detecting compiler bugs in smart contracts. They translated the byte- code of Solidity in RGB color code which is further transformed into a fixed size encoded image. Next, the encoded image is fed to CNN for detecting bugs.
-
In the similar direction, Tann et al. [116] utilized the LSTM model to detect new attack trends for the smart contracts. LSTM performs a two-class classification and reduces detection loss function to maximize classification accuracy, and to detect security threats in smart contracts. Authors have leveraged the fact that smart contracts are sequential in nature, so, they can be easily used to update the LSTM model for future contracts.
5.2.4 降低網路的匿名性
Another potential application of ML for blockchain is to reduce the anonymity of the network. Notably, blockchain network is assumed to attain a high degree of anonymity as in blockchain each participant is referred by its public key address. However, the authors of [140] claim that it is possible to cluster Bitcoin addresses and map them to real-word identity.
- In the same context, Harlev et al. [109] conducted a study to probe the true depth of participant’s anonymity using a supervised ML approach. Firstly, the addresses are clustered where they predict the category of yet unidentified Bitcoin addresses based on how addresses are controlled by a single entity using behavioral intelligence- based clustering and co-spend clustering. Next, the identified clustered are categorized into one of the predefined categories, i.e., exchange, gambling, hosted wallet, merchant services, mining pool, mixing, ran- somware, and scam, etc. The primary dataset used for simulation includes transactional data which has details about each transaction. Here, seven different ML algorithms are used to analyze the transac- tional data involving k-nearest neighbor, random forests, extra trees, Adaboost, decision trees, gradient boosting, and bagging classifier. The result section concludes that the gradient boosting method performs best among all.
- In contrast, Jourdan et al. [107] experimentally obtain lower value for parameter F1-score by using gradient boosting method. Also, their methodology involves a complex step for hyper-parameter optimization.
- In a closely related work, authors of [141] et al. presented a method to break the Bitcoin anonymity concept via entity charac- terization. Here, the cascade of classifiers is used which first involves entity classification using address and motif’s and next step uses this output for input of next classification step. Experiments are conducted and compared using Adaboost, Random forest, and gradient boost models. However, there is a disadvantage that this approach is not able to characterize entities with normal user behavior. The proposal is although able to detect six entity classes, i.e., Exchange, Gambling, Market, Mining Pool, Mixer, and Service.
The general procedure of entity characterization process of Bitcoin is represented in Fig. 16.
5.2.5 區塊鏈相關資料的分類
Classification of data is very important for decision-making tasks[142]. Popular classification algorithms include k-nearest neighbor based methods, decision tree methods, NN based networks, multivariate discriminant analysis method, and SVM method [143]. ML classification methodology has been used with blockchain for data quality and transaction entity classification which is discussed as follows:
- Blockchain data quality classification: With the increase in the revolution of IoT technology, the usage of health specific ap- plications such as — smart bands, smart watches, etc. has also been increased. However, the presence of malicious nodes can sometimes lead to slow degradation of the system. This personal health data is secured with blockchain network by many re- searchers [144]. Moreover, to check the validity of continuous and dynamic generated data by sensors, authors of [117] have used ML techniques. Apart from previous roles in blockchain, another role named data validator has been introduced who is responsible for validating and certifying the quality of data generated from sensors. Here, the quality classifier for health data classified the input data with predefined features and removes meaningless data and noise. As an example, take the case of smart watch readings for 24 h. The data validation algorithm can differentiate sleep related data from other workout data. However, the predefined rules depending on the choice of owner decide whether to classify sleep related data as high quality or noise. Fig. 17 presents an illustration of ML based blockchain data quality validation process in a healthcare network. The serving data is validated before it is piped to the blockchain network. The data analyzer is responsible for computing predefined set of statistics sufficient to define the data. ML agent works at data val- idator module that trains the system using schema and constrains API’s. The system can also classify the data into categories using ML classifiers. In another work, authors of [145] utilized ML to analyze data of a blockchain based credit card scoring system. The blockchain based transaction data is sent to ML agent that extracts features from data and then applies binary classification model to categorize customers that would not be able to the requisite amount in a destined amount of time. These classification results are next considered by the bank to decide whether credit request is to be initiated for a particular customer or not. Another matter to be considered in the context of centralization is trading IoT generated data with a TTP. To this end, the authors of [106] presented a data trading model utilizing blockchain, smart contract, and similarity learning. Here, arbitration insti- tution having responsibility of maintaining smart contract uses ML services to solve any dispute over availability of data for data purchasers utilizing classification and clustering data solu- tions. In particular, similarity learning (distance metric learning) is adopted to validate the distance between features of actual data and declared data. Distance metric learning has been used extensively for classification and clustering problems.
- Classification of blockchain peers/ transaction entities: Public blockchains are open and can be joined by anybody in the net- work. In such a case, there is a possibility that some participants may misbehave for personal interest while the majority of the par- ticipants behave legally. Clearly, it is hard to study the behavior of participants manually. To address this problem, the authors of [118] present an approach to classify behavior patterns of participants into predefined categories by using LSTM based DL approach. The transaction amount is extracted as a feature to clas- sify participants. Based on the transaction amount, participants are classified into three categories, i.e., stable transaction history, medium jitter history, and high jitter transaction history. With a similar motive, the authors of [104] classified the entities of transaction in 4 categories, i.e., exchange, service, gambling, and mining pool. Here, gradient boosted decision tree algorithm with a Gaussian process based optimization is used as a classification method. The results concluded that the accuracy in the classifica- tion of categories exchange, gambling, and service categorization is high as compared to the mining pool category. Additionally, authors of [105] presented a supervised ML approach in order to classify entities of transactions engaged in cybercriminal activity. To train the classification model 854 categorical observations with 12 classes and 10000 non-categorical identifiers are con- sidered. It is concluded in results that random forest, extremely randomized forest, bagging, and gradient boosting are the best four classifiers.
本一章節作者的總結
Summary and insights Section 4.2 reviews various ML applications for blockchain networks such as — to optimize resource allocation, to im- prove cryptocurrency price prediction, to detect anomalies of the network, and to classify blockchain related data. The increasing storage size on blockchain demands more resources. With data sharding and pruning solutions, ML can help blockchain networks in taking better decisions for data stor- age. Also, ML in blockchain networks can identify malicious activities by developing and training ML models. However, it has been observed that learning based examination of the blockchain systems has not been exploited much in literature work. Moreover, for protecting wallet privacy, the applications of clustering techniques that are proposed to address a broad range of blockchain applications have not utilized any string search mechanism such as — bloom filter. By using the string search mechanisms, storage complexity and searching time complexity for various validation and verification operations can be reduced significantly. Also, the proposed clustering pro- posals can be extended to increase the relatively low sample size of clusters along with adding more cluster categories to differentiate effectively between clusters.
六、ML+BC+IoT的挑戰
While the previous sections have presented a study on the integration of blockchain and ML, this section discusses challenges that need to be considered for future research
- Confidentiality is still not fully preserved with blockchain as any node can trace transactions and it is observed that only a few re- search studies have focused on the lack of confidentiality feature of blockchain for ML data. Moreover, blockchain standards and regulations are yet to be finalized.
- Here, it is worth mentioning the problem of data storage as nodes of the blockchain network keep the copy of every transaction of the network. This increasing database size could be difficult to handle in future. Hence, the issue of scalability for blockchain platforms should be focused to popularize the applications of blockchain for ML. As a solution, the usage of emerging mecha- nisms such as — sidechains or childchains should be encouraged in research. Moreover, PoW computations can prove costly in terms of resource utilization and transaction time. So, models should be developed that do not consume unnecessary compu- tational power. Also, the existing ML models demand creation of custom datasets having specific variables. Moreover, these are not able to satisfy the various service requirements of complex networks. Hence, it is challenging to scale for development of ML models with the ever increasing IoT data.
- It is also observed from the tabulated comparison of available literature that most of the research work is based on permissioned blockchain. However, the vulnerability of 51% attack is easy to launch in permissioned blockchain to which none of the studies have considered. Also, the use of permissioned blockchain limits the access of an enormous amount of data that can be required for ML system to process accurate decision-making. To address this problem, blockchain platforms and IoT resources should be equipped with a Trusted Execution Environment (TEE) [146].
- Federated learning adopted by many researchers faces the issue of communication bandwidth. Undoubtedly, the mobile device has enough computing resources in order to implement federated learning. Unfortunately, the bandwidth of wireless communi- cation is not adequate. So, research has been shifting gradu- ally towards computational resources to wireless communication Methods such as — deep gradient compression should be used to decrease the communication bandwidth [147].
- Also, in public blockchain data is publicly available and accessible for all readers which is indeed a privacy concern. However, using private blockchain can limit the exposure of large amount of data which is obviously necessary for ML model to perform accurate decision-making. Along with privacy, security is another concerning factor as this technology suffers attacks in the applica- tion layer. Also, the consensus mechanisms can be compromised depending on the hashing power of the miner. Nevertheless, ML algorithms provide detection of various attacks in blockchain networks but still, there are challenges for using ML algorithms in detecting malicious threats. For example, for a large dataset having malicious data the security solution for detecting ma- licious behavior has to deal with high dimensionality of data for pre-processing. In such case, ML model has to first perform dimensionality reduction step. Moreover, it is impossible to train a ML model with a large dataset in real-time so it is challenging to detect online attacks in dynamic networks.
- 5G and B5G are an example of a heterogeneous network designed for a wide range of IoT devices. The enormous amount of data generated from these devices can put heavy weight on ML model for decision making leading to limited performance. In this con- text, blockchain can solve the security issues to some extent but network performance at issue will still be a problem.
七、本文作者的總結
In this paper, we reviewed the current state-of-art related to the collaboration of ML and blockchain. We presented an overview of blockchain technology and how this decentralized technology can solve the privacy issues related to ML. Moreover, we provide an overview of ML technology and discussed key applications, applicability of blockchain features for ML. The literature review shows that blockchain and ML collaborated applications are still in infancy and there are many research challenges that need to be addressed. However, the current research is a foundation for an interdisciplinary perspective. In the future, we will implement one of these techniques in future IoT applications to check its performance with respect to other applications using various performance evaluation metrics.
本文作者的宣告
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper
八、我的總結
先不總結了,emmm,本文后面的內容還沒仔細看,我明天再整理一下
參考
文章資訊
- 網址:https://www.sciencedirect.com/science/article/abs/pii/S0140366421002632
封面資訊
- 網址:https://www.gracg.com/works/view/1495553
- 作者:劉翔ART http://gracg.com/user/user93912SfTIdM
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/377175.html
標籤:區塊鏈
上一篇:0基礎教學:在BSC主鏈上部署智能合約并驗證合約|BSC發幣|幣安發幣
下一篇:區塊鏈論文搜索
