我在使用大型 Excel 表格時需要幫助。
描述
我從我們的 ERP 系統匯出的資料至少有 400K 行。在這份報告中,格式非常混亂,我想撰寫一個腳本來清理所有資料。
我開始寫小子只是為了洗掉空行和具有特殊行為的行。請看下面:
Sub Main()
OptimizeVBA (True)
DeleteLastRows
OptimizeVBA (False)
End Sub
Sub DeleteLastRows()
'Achtung, diese Funktion dauert sehr lange
Dim total
total = ActiveSheet.UsedRange.Rows.Count
Dim Tim1 As Single
Tim1 = Timer
For i = total To total - 100 Step -1
If ThatSpecialLine("0", i, 1, 9) Then
'DeleteRow (i)
Rows(i).EntireRow.Delete
ElseIf EmptyRow(i, 1, 13) Then
'DeleteRow (i)
Rows(i).EntireRow.Delete
End If
Next
Tim1 = Timer - Tim1
MsgBox ("Anzahl der Zeilen nach der Bearbeitung: " & ActiveSheet.UsedRange.Rows.Count & vbNewLine & "Dafür wurde: " & Tim1 & " gebraucht")
End Sub
Function EmptyRow(ByVal Row As Long, ByVal startc As Integer, ByVal EndC As Integer) As Boolean
EmptyRow = True
Dim temp As String
For i = startc To EndC
temp = Cells(Row, i).Value
temp = Trim(temp)
If temp <> "" Then
EmptyRow = False
Exit Function
End If
Next
End Function
Function ThatSpecialLine(ByVal val As String, ByVal Row As Long, ByVal startc As Integer, ByVal EndC As Integer) As Boolean
ThatSpecialLine = False
If EmptyRow(Row, startc, EndC) Then
If Cells(Row, EndC 1).Value = val Then
ThatSpecialLine = True
End If
End If
End Function
Sub OptimizeVBA(isOn As Boolean)
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
ActiveSheet.DisplayPageBreaks = Not (isOn)
End Sub
這段代碼需要大約 14 秒才能執行 100 行。我想知道為什么性能如此糟糕。我沒有優化應用程式性能的經驗,所以如果我的問題很愚蠢,請善待:)。
問題
- 將此 .xlsx 檔案匯出到 .txt 檔案并使用我在 Visual Studio 中使用 vb.net 或 C# 撰寫的程式進行處理會更好/更快嗎?這將是我的下一個想法。
- 如何改進我的 vba 代碼?
將此 .xlsx 檔案匯出到 .txt 檔案并使用我在 Visual Studio 中使用 vb.net 或 C# 撰寫的程式進行處理會更好/更快嗎?這將是我的下一個想法。
提前致謝
uj5u.com熱心網友回復:
您的代碼中有兩件事會使執行速度變慢。
第一件事與 Excel 與 VBA 有關。每次您的 VBA 代碼需要 Excel 中的某些內容時,它都必須呼叫內部介面,這相當慢。當您有一張包含幾行/幾列的作業表時,您無法測量這一點,但是在一張有 400k 行和(至少)13 列資料的作業表中,您有 500 萬個單元格,并且您的代碼讀取其中大部分 2 次. 這可以通過將大塊資料讀入陣列來加快速度。這只是一次讀取,對于這 500 萬個單元,可能只是一秒鐘的問題。
第二件事是純 Excel:即使您關閉重新計算和螢屏更新,從作業表中洗掉一行資料也非常緩慢。這意味著,您應該通過將要洗掉的行“收集”到 Range 變數中來減少洗掉次數,然后一次將它們全部洗掉。但是,收集的行數不應該太高。我嘗試了一下,1000似乎是合理的。
Sub DeleteLastRows()
Const DeleteChunkSize = 1000
Dim lastRow As Long
With ThisWorkbook.Sheets(1)
lastRow = .Cells(.Rows.Count, "A").End(xlUp).row
' Read all Data into Memory
Dim AllData As Variant
AllData = .Range(.Cells(1, 1), .Cells(lastRow, 13))
Debug.Print "data read"
Dim row As Long
For row = lastRow To 2 Step -1
If row Mod 100 = 0 Then DoEvents
Dim deleteRange As Range, deleteCount As Long
Dim toBeDeleted As Boolean
toBeDeleted = ThatSpecialLine(AllData, "0", row, 1, 9) Or EmptyRow(AllData, row, 1, 13)
If toBeDeleted Then
deleteCount = deleteCount 1
If deleteRange Is Nothing Then
Set deleteRange = .Cells(row, 1).EntireRow
Else
Set deleteRange = Union(deleteRange, .Cells(row, 1).EntireRow)
End If
' Delete only if a certain number of rows to be deleted is reached to speed up runtime
If deleteCount >= DeleteChunkSize Then
DoEvents
deleteRange.Delete xlUp
Set deleteRange = Nothing
deleteCount = 0
End If
End If
Next row
End With
' delete the last chunk of data if any
If Not deleteRange Is Nothing Then
deleteRange.Delete xlUp
End If
End Sub
我調整了您的輔助例程,以便它們處理作為引數傳遞的資料陣列:
Function EmptyRow(data As Variant, row As Long, startc As Long, EndC As Long) As Boolean
EmptyRow = True
Dim temp As String
Dim i As Long
For i = startc To EndC
temp = Trim(data(row, i))
If temp <> "" Then
EmptyRow = False
Exit Function
End If
Next
End Function
Function ThatSpecialLine(data As Variant, val As String, row As Long, startc As Long, EndC As Long) As Boolean
If Not EmptyRow(data, row, startc, EndC) Then Exit Function
ThatSpecialLine = (data(row, EndC 1) = val)
End Function
對于要洗掉的 1000 行,該代碼或多或少花費了 1 秒——我的示例表大約有 30% 的此類行。這將導致幾分鐘范圍內的運行時間。
但是有一個更快的嘗試,假設您只對資料感興趣,而不是格式。不要洗掉 Excel 作業表中的行,而是復制要保留在第二個陣列中的資料。完成后,洗掉作業表的所有資料并將復制的資料寫回 Excel。在我的示例表中,這可能需要 2 或 3 秒,行數 > 800k:
Sub CopyRelevantData()
Dim lastRow As Long
With ThisWorkbook.Sheets(1)
lastRow = .Cells(.Rows.Count, "A").End(xlUp).row
' Read all Data into Memory
Dim AllData As Variant, newData As Variant
AllData = .Range(.Cells(1, 1), .Cells(lastRow, 13))
' Create a second array where you copy the data you want to keep
ReDim newData(LBound(AllData, 1) To UBound(AllData, 1), LBound(AllData, 2) To UBound(AllData, 2))
Debug.Print "data read"
Dim row As Long, newRow As Long
For row = 1 To lastRow
Dim toBeDeleted As Boolean
toBeDeleted = ThatSpecialLine(AllData, "0", row, 1, 9) Or EmptyRow(AllData, row, 1, 13)
If Not toBeDeleted Then
' Copy this row of data
newRow = newRow 1
Dim col As Long
For col = LBound(AllData, 2) To UBound(AllData, 2)
newData(newRow, col) = AllData(row, col)
Next col
End If
If row Mod 100 = 0 Then DoEvents
Next row
.UsedRange.Clear
.Cells(1, 1).Resize(UBound(AllData, 1), UBound(AllData, 2)) = newData
End With
End Sub
uj5u.com熱心網友回復:
解決方案是在 VB.Net 中撰寫一個快速程式,只需讀取所需的行。我還對代碼做了一些改進。
以下代碼只需 1 秒即可讀取 List(of string()) 中的檔案并將其再次寫回 .csv
我認為我不會再將 vba 用于大資料。隨意改變我的想法。
Imports System.IO
Imports System.IO.File
Imports System.Text
Public Class Form1
Public Datas As New List(Of String())
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
FileToList()
DataToFile()
End Sub
Sub FileToList()
Using sr As StreamReader = New StreamReader("Bestand 31.10.2022.CSV", Encoding.Default)
Dim Time As DateTime = Now
Dim span As TimeSpan
Do Until sr.Peek() = -1
Dim s As String = sr.ReadLine()
Dim a() As String = s.Split(";")
If Not EmptyRow(a) Then
Datas.Add(a)
End If
Loop
span = Now - Time
Dim i As Long = Datas.Count
MessageBox.Show(String.Format("Es sind: {0} Zeilen vorhanden in der Liste" & vbCrLf &
"Dies ben?tigte: {1}s", i, span.TotalSeconds))
End Using
End Sub
Sub DataToFile()
Dim Time As DateTime = Now
Dim span As TimeSpan
Using fs As FileStream = New FileStream("Test_" & DateTime.Now.ToShortDateString & ".csv", FileMode.Create)
Using sw As StreamWriter = New StreamWriter(fs, Encoding.Default)
For i = 0 To Datas.Count - 1
sw.WriteLine(Join(Datas(i), ";"))
Next
End Using
End Using
span = Now - Time
MessageBox.Show(String.Format("Das Erstellen der neuen Datei hat: {0}s gedauert", span.TotalSeconds))
End Sub
Function EmptyRow(ByVal Array As String()) As Boolean
For i = 0 To Array.Count - 1
If Array(i) <> "" Then
Return False
End If
Return True
Next
End Function
End Class
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/530988.html
標籤:擅长vba表现
