bert.encoder.layer.0.attention.self.query.weight [768, 768]nn.Linear
bert.encoder.layer.0.attention.self.query.bias [768]
bert.encoder.layer.0.attention.self.key.weight [768, 768]nn.Linear
bert.encoder.layer.0.attention.self.key.bias [768]
bert.encoder.layer.0.attention.self.value.weight [768, 768]nn.Linear
bert.encoder.layer.0.attention.self.value.bias [768]
bert.encoder.layer.0.attention.output.dense.weight [768, 768]nn.Linear
bert.encoder.layer.0.attention.output.dense.bias [768]
bert.encoder.layer.0.attention.output.LayerNorm.weight [768]nn.LayerNorm
bert.encoder.layer.0.attention.output.LayerNorm.bias [768]
bert.encoder.layer.0.intermediate.dense.weight [3072, 768]nn.Linear
bert.encoder.layer.0.intermediate.dense.bias [3072]
bert.encoder.layer.0.output.dense.weight [768, 3072]nn.Linear
bert.encoder.layer.0.output.dense.bias [768]
bert.encoder.layer.0.output.LayerNorm.weight [768]nn.LayerNorm
bert.encoder.layer.0.output.LayerNorm.bias [768]
這是bert的layer0的引數,其他都能理解,但這個3072,看其他決議都只說他是等于4*H(768),但是為什么呢?和head什么的有關系嗎?
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/149779.html
標籤:人工智能技術
上一篇:appium自動化測驗代碼放一段時間會報:NameError: name _HTTPConnection is not defined
