主頁 >  其他 > Introduction - Unsupervised Learning

Introduction - Unsupervised Learning

2020-09-17 07:30:06 其他

摘要: 本文是吳恩達 (Andrew Ng)老師《機器學習》課程,第一章《緒論:初識機器學習》中第4課時《無監督學習》的視頻原文字幕,為本人在視頻學習程序中逐字逐句記錄下來以便日后查閱使用,現分享給大家,如有錯誤,歡迎大家批評指正,在此表示誠摯地感謝!同時希望對大家的學習能有所幫助,

In this video (article), we'll talk about the second major type of machine learning problem, called Unsupervised Learning. In the last video, we talked about Supervised Learning. Back then, we got data sets that look like this, where each example was labeled either as a positive or negative example, whether it was a benign or a malignant tumor. So for each example in Supervised Learning, we were told explicitly what is the so-called right answer, whether it's benign or malignant.

In Unsupervised Learning, we're given data that looks different, data that looks like this, that doesn't have any labels, or that all has the same labels or really no labels. So we're given the data set and we're not told what to do with it, and we're not told what each data point is. Instead we're just told, here is a data set. Can you find some structure in the data? Given this data set, an Unsupervised Learning algorithm might decide that the data lives in two different clusters. And so there's one cluster, and there's a different cluster. And the unsupervised learning algorithm may break these data into these two separate clusters. So this is called a clustering algorithm. And this turns out to be used in many places.

?

One example where clustering is used is in Google News and if you have not seen this before, you can actually go to this URL news.google.com to take a look. What Google News does is everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on the web, and it groups them into cohesive news stories. For example, let's look here (red rectangle). The URLs here link to different news stories about the BP Oil Well story. So, let's click on one of these URLs, and we'll click on one of these URLs. What we'll get to is a web page like this. Here's a Wall Street Journal article about the BP Oil Well Spill stories, of "BP Kills Macondo", which is a name of the spill. And if you click on a different URL from that group, then you might get the different story. Here's the CNN story about, again, the BP Oil Spill. And if you click on yet a third link, then you might get a different story. Here's the UK Guardian story about the BP Oil Spill. So what Google News has done is look for tens of thousands of news stories, and automatically cluster them together. So, the news stories that are all about the same topic get displayed together.

?

It turns out that clustering algorithms and Unsupervised Learning algorithms are used in many other problems as well. Here's one on understanding genomics. Here's an example of DNA microarray data. The idea is to have a group of different individuals, and for each of them, you measure how much they do or do not have a certain gene. Technically you measure how much certain genes are expressed. So these colors, red, green, gray and so on, they showed the degree to which different individuals do or do not have a specific gene. And what you can do is then run a clustering algorithm to group individuals into different categories or into different types of people. So this is Unsupervised Learning, because we're not telling the algorithm in advance that these are type 1 people, those are type 2 persons, those are type 3 persons and so on. And instead what we were saying is here's a bunch of data, I don't know what's in this data, I don't know who is in what type, I don't even know what the different types of people are, but can you automatically find structure in the data for me? Can you automatically cluster the individuals into these types that I don't know in advance? Because we're not giving the algorithm the right answer for the examples in my data set, this is Unsupervised Learning.

?

Unsupervised Learning or clustering is used for a bunch of other applications. It's used to organize large computer clusters. I had some friends looking at large data centers, that is large computer clusters, and trying to figure out which machines tend to work together. And if you can put those machines together, you can make your data center work more efficiently. This second application is on social network analysis. So given the knowledge about which friends you email the most, or given your facebook friends or your Google+ circles, can we automatically identify which are cohesive groups of friends, also which are groups of people that all know each other? Market segmentation. Many companies have huge databases of customer information. So, can you look at this customer data set, and automatically discover market segments, and automatically group your customers into different market segments, so that you can automatically and more efficiently sell or market your different market segments together? Again, this is Unsupervised Learning, because we have all this customer data, but we don't know in advance what are the market segments, and for the customers in our data set, you know, we don't know in advance who is in market segment wone, who is in market segment two, and so on. But we have to let the algorithm discover all this just from the data. Finally, it turns out that Unsupervised Learning is also used for surprisingly astronomical data analysis, and these clustering algorithms gave surprisingly interesting theories of how galaxies are formed. All of these are examples of clustering, which is just one type of Unsupervised Learning. Let me tell you about another one.

?

I'm gonna tell you about the cocktail party problem. So, you've been to cocktail parties before, right? Well, you can imagine there's a party, room full of people, all sitting around, all talking at the same time. All there are all these overlapping voices, because everyone is talking at the same time, and it is almost hard to hear the person in front of you. So maybe at a cocktail party of two people, two people talking at the same time, and it's a somewhat small cocktail party. And we're going to put two microphones in the room, so there are microphones, and because these microphones are at two different distances from the speakers, each microphone records a different combination of these two speaks' voices. Maybe speaker one is a little louder in microphone one, and maybe speaker two is a little bit louder on microphone 2, because the two microphones are at different positions relative to the two speakers, but each microphone records an overlapping combination of both speakers' voices. So, here's an actual recording of two speakers recorded by a researcher. Let me play for you of the first, what the first microphone sounds like. One(uno), two(dos), three(tres), four(cuatro), five(cinco), six(seis), seven(siete), eight(ocho), nine(neuve), ten(y diez). All right, maybe not the most interesting cocktail party, there's two people counting from one to ten in two languages but you know. What you just heard was the first microphone recording, here's the second recording. Uno(one), dos(two), tres(three), cuatro(four), cinco(five), seis(six), siete(seven), ocho(eight), nueve(nine), y diez(ten). So we can do is take these two microphones' recordings and give them to an Unsupervised Learning algorithm, called the cocktail party algorithm. And tell the algorithm find structure in this data for me. And what the algorithm will do is listen to these audio recordings, and say, you know it sounds like the two audio recordings that are being added together, or that are being summed together to produce these recordings that we had. Moreover, what the cocktail party algorithm will do is separate out these two audio sources that were being added or being summed together to form other recordings. And in fact, here's the first output of the cocktail party algorithm. One, two, three, four, five, six, seven, eight, night, ten. So, it separated out the English voice in one of the recordings. And here's the second output. Uno, dos, tres, quatro, cinco, seis, siete, ocho, nueve, y diez. Not too bad. To give you one more example, here's another recording of another similar situation, here's the first microphone [with background music]: one, two, three, four, five, six, seven, eight, nine, ten. Ok so the poor guy's gone home from the cocktail party, and he's now sitting in a room by himself talking to his radio. Here's the second microphone recording: one, two, three, four, five, six, seven, eight, nine, ten. When you give these two microphones recordings to the same algorithm, what it does, is again say, you know, it sounds like there are two audio sources, and moreover, the algorithm says, here is the first of the audio source I found. One, two, three, four, five, six, seven, eight, nine, ten. So that wasn't perfect, it got the voice, but it also got a little bit of the music in there. Then here's the second output to the algorithm: [the music]. Not too bad, in that second output it managed to get rid of the voice entirely, and just, you know, cleaned up the music, got rid of the counting from one to ten. So, you might look at an Unsupervised Learning algorithm like this, and ask how complicated it is to implement this, right? It seems like in order to build this application, it seems like to do this audio processing, you need to write a ton of code, or maybe link into like a bunch of C++ or Java libraries that process audio, it seems like a really complicated program, to do this audio, separate out audio and so on. It turns out the algorithm, to do what you just heard, that can be done in one line of code shown right here.

?

It did take researchers a long time to come up with this line of code. I'm not saying this is an easy problem. But it turns out that when you use the right programming environment, many learning algorithms can be really short programs. So, this is also why in this class we're going to use the Octave programming environment. Octave is free open source software, and using a tool like Octave or Matlab, many learning algorithms become just a few lines of code to implement. Later in this class, I'll just teach you a little bit about how to use Octave, and you'll be implementing some of these algorithms in Octave. Or if you have MATLAB, you can use that too. It turns out the Silicon Valley, for a lot of machine learning algorithms, what we do is first prototype our software in Octave, because software in Octave makes it incredibly fast to implement these learning algorithms. Here each of these functions like for example the SVD function, that stands for singular value decomposition; but that turns out to be a linear algebra routine that is just built into Octave. If you were trying to do this in C++ or Java, this would be many lines of codes, linking complex C++ or Java libraries. So, you can implement this stuff as C++ or Java or Python, it's just more complicated to do so in those languages. What I've seen after having taught machine learning for almost a decade now, is that, you learn much faster if you use Octave as your programming environment, and if you use Octave as your learning tool and as your prototyping tool, it'll let you learn and prototype learning algorithms more much quickly. And in fact what many people will do in the large Silicon Valley companies is in fact, use an algorithm(tool?) like Octave to first prototype the learning algorithm. And only after you've gotten it to work, then you migrate it to C++ or Java or whatever. It turns out that by doing things this way, you can often get your algorithm to work much faster than if you were starting out in C++. So, I know that as an instructor, I get to say "trust me on this one" only a finite number of times, but for those of you who have never used these Octave programming environment before, I'm going to ask you to trust me on this one and say that you will, I think, your development time is one of the most valuable resources. And having seen lots of people do this, I think you as a machine learning researcher, or machine learning developer, will be much more productive if you learn to start in prototype, just start in Octave, and then some other language.

?

Finally, to wrap up this video, I have a quick review question for you. We talked about Unsupervised Learning, which is a learning setting where you give the algorithm a ton of data and just ask it to find structure in the data for us. Of the following four examples, which ones, which of these four do you think will be an Unsupervised Learning algorithm as opposed to Supervised Learning problem. For each of the four check boxes on the left, check the ones for which you think Unsupervised Learning algorithm would be appropriate, and then click the button on the lower right to check your answer. So, when the video pauses, please answer the question on the slide. So, hopefully, you've remembered the spam filter problem. If you have labeled data, you know, of spam and non-spam e-mail, we'd treat this as a Supervised Learning problem. The news story example, that's exactly the Google News example that we saw in this video, we saw how you can use a clustering algorithm to cluster these articles together, so that's Unsupervised Learning. The market segmentation example I talked a little bit earlier, you do that as an Unsupervised Learning problem, because I am just gonna give my algorithm data, and ask it to discover market algorithms automatically. And the final example, diabetes, well that's actually just like our breast cancer example from the last video. Only instead of, you know, good and bad cancer tumors or benign or malignant tumors, we instead have diabetes or not, and so we will solve that as a Supervised Learning problem, just like we did for the breast tumor data.

So, that is it for Unsupervised Learning, and in the next video (article), we'll delve more into specific learning algorithms, and start to talk about how these algorithms work, and how you can go about implementing them.

<end>

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/63756.html

標籤:其他

上一篇:Introduction - What is machine learning

下一篇:利用selenium無法操作網頁元素的問題

標籤雲
其他(157675) Python(38076) JavaScript(25376) Java(17977) C(15215) 區塊鏈(8255) C#(7972) AI(7469) 爪哇(7425) MySQL(7132) html(6777) 基礎類(6313) sql(6102) 熊猫(6058) PHP(5869) 数组(5741) R(5409) Linux(5327) 反应(5209) 腳本語言(PerlPython)(5129) 非技術區(4971) Android(4554) 数据框(4311) css(4259) 节点.js(4032) C語言(3288) json(3245) 列表(3129) 扑(3119) C++語言(3117) 安卓(2998) 打字稿(2995) VBA(2789) Java相關(2746) 疑難問題(2699) 细绳(2522) 單片機工控(2479) iOS(2429) ASP.NET(2402) MongoDB(2323) 麻木的(2285) 正则表达式(2254) 字典(2211) 循环(2198) 迅速(2185) 擅长(2169) 镖(2155) 功能(1967) .NET技术(1958) Web開發(1951) python-3.x(1918) HtmlCss(1915) 弹簧靴(1913) C++(1909) xml(1889) PostgreSQL(1872) .NETCore(1853) 谷歌表格(1846) Unity3D(1843) for循环(1842)

熱門瀏覽
  • 網閘典型架構簡述

    網閘架構一般分為兩種:三主機的三系統架構網閘和雙主機的2+1架構網閘。 三主機架構分別為內端機、外端機和仲裁機。三機無論從軟體和硬體上均各自獨立。首先從硬體上來看,三機都用各自獨立的主板、記憶體及存盤設備。從軟體上來看,三機有各自獨立的作業系統。這樣能達到完全的三機獨立。對于“2+1”系統,“2”分為 ......

    uj5u.com 2020-09-10 02:00:44 more
  • 如何從xshell上傳檔案到centos linux虛擬機里

    如何從xshell上傳檔案到centos linux虛擬機里及:虛擬機CentOs下執行 yum -y install lrzsz命令,出現錯誤:鏡像無法找到軟體包 前言 一、安裝lrzsz步驟 二、上傳檔案 三、遇到的問題及解決方案 總結 前言 提示:其實很簡單,往虛擬機上安裝一個上傳檔案的工具 ......

    uj5u.com 2020-09-10 02:00:47 more
  • 一、SQLMAP入門

    一、SQLMAP入門 1、判斷是否存在注入 sqlmap.py -u 網址/id=1 id=1不可缺少。當注入點后面的引數大于兩個時。需要加雙引號, sqlmap.py -u "網址/id=1&uid=1" 2、判斷文本中的請求是否存在注入 從文本中加載http請求,SQLMAP可以從一個文本檔案中 ......

    uj5u.com 2020-09-10 02:00:50 more
  • Metasploit 簡單使用教程

    metasploit 簡單使用教程 浩先生, 2020-08-28 16:18:25 分類專欄: kail 網路安全 linux 文章標簽: linux資訊安全 編輯 著作權 metasploit 使用教程 前言 一、Metasploit是什么? 二、準備作業 三、具體步驟 前言 Msfconsole ......

    uj5u.com 2020-09-10 02:00:53 more
  • 游戲逆向之驅動層與用戶層通訊

    驅動層代碼: #pragma once #include <ntifs.h> #define add_code CTL_CODE(FILE_DEVICE_UNKNOWN,0x800,METHOD_BUFFERED,FILE_ANY_ACCESS) /* 更多游戲逆向視頻www.yxfzedu.com ......

    uj5u.com 2020-09-10 02:00:56 more
  • 北斗電力時鐘(北斗授時服務器)讓網路資料更精準

    北斗電力時鐘(北斗授時服務器)讓網路資料更精準 北斗電力時鐘(北斗授時服務器)讓網路資料更精準 京準電子科技官微——ahjzsz 近幾年,資訊技術的得了快速發展,互聯網在逐漸普及,其在人們生活和生產中都得到了廣泛應用,并且取得了不錯的應用效果。計算機網路資訊在電力系統中的應用,一方面使電力系統的運行 ......

    uj5u.com 2020-09-10 02:01:03 more
  • 【CTF】CTFHub 技能樹 彩蛋 writeup

    ?碎碎念 CTFHub:https://www.ctfhub.com/ 筆者入門CTF時時剛開始刷的是bugku的舊平臺,后來才有了CTFHub。 感覺不論是網頁UI設計,還是題目質量,賽事跟蹤,工具軟體都做得很不錯。 而且因為獨到的金幣制度的確讓人有一種想去刷題賺金幣的感覺。 個人還是非常喜歡這個 ......

    uj5u.com 2020-09-10 02:04:05 more
  • 02windows基礎操作

    我學到了一下幾點 Windows系統目錄結構與滲透的作用 常見Windows的服務詳解 Windows埠詳解 常用的Windows注冊表詳解 hacker DOS命令詳解(net user / type /md /rd/ dir /cd /net use copy、批處理 等) 利用dos命令制作 ......

    uj5u.com 2020-09-10 02:04:18 more
  • 03.Linux基礎操作

    我學到了以下幾點 01Linux系統介紹02系統安裝,密碼啊破解03Linux常用命令04LAMP 01LINUX windows: win03 8 12 16 19 配置不繁瑣 Linux:redhat,centos(紅帽社區版),Ubuntu server,suse unix:金融機構,證券,銀 ......

    uj5u.com 2020-09-10 02:04:30 more
  • 05HTML

    01HTML介紹 02頭部標簽講解03基礎標簽講解04表單標簽講解 HTML前段語言 js1.了解代碼2.根據代碼 懂得挖掘漏洞 (POST注入/XSS漏洞上傳)3.黑帽seo 白帽seo 客戶網站被黑帽植入劫持代碼如何處理4.熟悉html表單 <html><head><title>TDK標題,描述 ......

    uj5u.com 2020-09-10 02:04:36 more
最新发布
  • 2023年最新微信小程式抓包教程

    01 開門見山 隔一個月發一篇文章,不過分。 首先回顧一下《微信系結手機號資料庫被脫庫事件》,我也是第一時間得知了這個訊息,然后跟蹤了整件事情的經過。下面是這起事件的相關截圖以及近日流出的一萬條資料樣本: 個人認為這件事也沒什么,還不如關注一下之前45億快遞資料查詢渠道疑似在近日復活的訊息。 訊息是 ......

    uj5u.com 2023-04-20 08:48:24 more
  • web3 產品介紹:metamask 錢包 使用最多的瀏覽器插件錢包

    Metamask錢包是一種基于區塊鏈技術的數字貨幣錢包,它允許用戶在安全、便捷的環境下管理自己的加密資產。Metamask錢包是以太坊生態系統中最流行的錢包之一,它具有易于使用、安全性高和功能強大等優點。 本文將詳細介紹Metamask錢包的功能和使用方法。 一、 Metamask錢包的功能 數字資 ......

    uj5u.com 2023-04-20 08:47:46 more
  • vulnhub_Earth

    前言 靶機地址->>>vulnhub_Earth 攻擊機ip:192.168.20.121 靶機ip:192.168.20.122 參考文章 https://www.cnblogs.com/Jing-X/archive/2022/04/03/16097695.html https://www.cnb ......

    uj5u.com 2023-04-20 07:46:20 more
  • 從4k到42k,軟體測驗工程師的漲薪史,給我看哭了

    清明節一過,盲猜大家已經無心上班,在數著日子準備過五一,但一想到銀行卡里的余額……瞬間心情就不美麗了。最近,2023年高校畢業生就業調查顯示,本科畢業月平均起薪為5825元。調查一出,便有很多同學表示自己又被平均了。看著這一資料,不免讓人想到前不久中國青年報的一項調查:近六成大學生認為畢業10年內會 ......

    uj5u.com 2023-04-20 07:44:00 more
  • 最新版本 Stable Diffusion 開源 AI 繪畫工具之中文自動提詞篇

    🎈 標簽生成器 由于輸入正向提示詞 prompt 和反向提示詞 negative prompt 都是使用英文,所以對學習母語的我們非常不友好 使用網址:https://tinygeeker.github.io/p/ai-prompt-generator 這個網址是為了讓大家在使用 AI 繪畫的時候 ......

    uj5u.com 2023-04-20 07:43:36 more
  • 漫談前端自動化測驗演進之路及測驗工具分析

    隨著前端技術的不斷發展和應用程式的日益復雜,前端自動化測驗也在不斷演進。隨著 Web 應用程式變得越來越復雜,自動化測驗的需求也越來越高。如今,自動化測驗已經成為 Web 應用程式開發程序中不可或缺的一部分,它們可以幫助開發人員更快地發現和修復錯誤,提高應用程式的性能和可靠性。 ......

    uj5u.com 2023-04-20 07:43:16 more
  • CANN開發實踐:4個DVPP記憶體問題的典型案例解讀

    摘要:由于DVPP媒體資料處理功能對存放輸入、輸出資料的記憶體有更高的要求(例如,記憶體首地址128位元組對齊),因此需呼叫專用的記憶體申請介面,那么本期就分享幾個關于DVPP記憶體問題的典型案例,并給出原因分析及解決方法。 本文分享自華為云社區《FAQ_DVPP記憶體問題案例》,作者:昇騰CANN。 DVPP ......

    uj5u.com 2023-04-20 07:43:03 more
  • msf學習

    msf學習 以kali自帶的msf為例 一、msf核心模塊與功能 msf模塊都放在/usr/share/metasploit-framework/modules目錄下 1、auxiliary 輔助模塊,輔助滲透(埠掃描、登錄密碼爆破、漏洞驗證等) 2、encoders 編碼器模塊,主要包含各種編碼 ......

    uj5u.com 2023-04-20 07:42:59 more
  • Halcon軟體安裝與界面簡介

    1. 下載Halcon17版本到到本地 2. 雙擊安裝包后 3. 步驟如下 1.2 Halcon軟體安裝 界面分為四大塊 1. Halcon的五個助手 1) 影像采集助手:與相機連接,設定相機引數,采集影像 2) 標定助手:九點標定或是其它的標定,生成標定檔案及內參外參,可以將像素單位轉換為長度單位 ......

    uj5u.com 2023-04-20 07:42:17 more
  • 在MacOS下使用Unity3D開發游戲

    第一次發博客,先發一下我的游戲開發環境吧。 去年2月份買了一臺MacBookPro2021 M1pro(以下簡稱mbp),這一年來一直在用mbp開發游戲。我大致分享一下我的開發工具以及使用體驗。 1、Unity 官網鏈接: https://unity.cn/releases 我一般使用的Apple ......

    uj5u.com 2023-04-20 07:40:19 more