我正在探索一個選項來比較 Java 中的兩個檔案并顯示 html 中的差異。
下面是代碼,我正在使用 -
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.LineIterator;
import org.apache.commons.text.diff.CommandVisitor;
import org.apache.commons.text.diff.StringsComparator;
public class FileDiff {
public static void main(String[] args) throws IOException {
// Read both files with line iterator.
LineIterator file1 = FileUtils.lineIterator(new File("file-1.txt"), "utf-8");
LineIterator file2 = FileUtils.lineIterator(new File("file-2.txt"), "utf-8");
// Initialize visitor.
FileCommandsVisitor fileCommandsVisitor = new FileCommandsVisitor();
// Read file line by line so that comparison can be done line by line.
while (file1.hasNext() || file2.hasNext()) {
/*
* In case both files have different number of lines, fill in with empty
* strings. Also append newline char at end so next line comparison moves to
* next line.
*/
String left = (file1.hasNext() ? file1.nextLine() : "") "\n";
String right = (file2.hasNext() ? file2.nextLine() : "") "\n";
// Prepare diff comparator with lines from both files.
StringsComparator comparator = new StringsComparator(left, right);
if (comparator.getScript().getLCSLength() > (Integer.max(left.length(), right.length()) * 0.4)) {
/*
* If both lines have atleast 40% commonality then only compare with each other
* so that they are aligned with each other in final diff HTML.
*/
comparator.getScript().visit(fileCommandsVisitor);
} else {
/*
* If both lines do not have 40% commanlity then compare each with empty line so
* that they are not aligned to each other in final diff instead they show up on
* separate lines.
*/
StringsComparator leftComparator = new StringsComparator(left, "\n");
leftComparator.getScript().visit(fileCommandsVisitor);
StringsComparator rightComparator = new StringsComparator("\n", right);
rightComparator.getScript().visit(fileCommandsVisitor);
}
}
fileCommandsVisitor.generateHTML();
}
}
/*
* Custom visitor for file comparison which stores comparison & also generates
* HTML in the end.
*/
class FileCommandsVisitor implements CommandVisitor<Character> {
// Spans with red & green highlights to put highlighted characters in HTML
private static final String DELETION = "<span style=\"background-color: #FB504B\">${text}</span>";
private static final String INSERTION = "<span style=\"background-color: #45EA85\">${text}</span>";
private String left = "";
private String right = "";
@Override
public void visitKeepCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" c) ? "<br/>" : "" c;
// KeepCommand means c present in both left & right. So add this to both without
// any
// highlight.
left = left toAppend;
right = right toAppend;
}
@Override
public void visitInsertCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" c) ? "<br/>" : "" c;
// InsertCommand means character is present in right file but not in left. Show
// with green highlight on right.
right = right INSERTION.replace("${text}", "" toAppend);
}
@Override
public void visitDeleteCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" c) ? "<br/>" : "" c;
// DeleteCommand means character is present in left file but not in right. Show
// with red highlight on left.
left = left DELETION.replace("${text}", "" toAppend);
}
public void generateHTML() throws IOException {
// Get template & replace placeholders with left & right variables with actual
// comparison
String template = FileUtils.readFileToString(new File("difftemplate.html"), "utf-8");
String out1 = template.replace("${left}", left);
String output = out1.replace("${right}", right);
// Write file to disk.
FileUtils.write(new File("finalDiff.html"), output, "utf-8");
System.out.println("HTML diff generated.");
}
}
對于較小的檔案,這很好用,并且在我的筆記本電腦上給了我很好的效果。但是,如果檔案大小更大(200MB),有半百萬行,那么我的 IntelliJ 似乎掛起。我的筆記本電腦的 RAM 是 16GB。
如何改進它以處理大檔案以進行比較?
謝謝
uj5u.com熱心網友回復:
您撰寫的方式FileCommandsVisitor可能會導致它無法優化。您正在做的是為訪問的每個字符添加字串,例如:
left = left toAppend;
right = right toAppend;
這可能會導致您執行的每次添加都會發生一個新的字串實體 - 一個字串的新實體到最后接近 200 MB 長。您訪問的每個角色都有一個新角色。舊的將需要收集垃圾。如果您的班級StringBuilder改為舉行 s ,并且您使用append()了 method 它可能會大大加快速度。有關更多詳細資訊,請閱讀字串連接:concat() 與“ ”運算子
為清楚起見(因為根據評論,您現在兩次錯過了重點):
class FileCommandsVisitor implements CommandVisitor<Character> {
//StringBuilder as properties
private StringBuilder left = new StringBuilder();
private StringBuilder right = new StringBuilder();
@Override
public void visitKeepCommand(Character c) {
String toAppend = "\n".equals("" c) ? "<br/>" : "" c;
// append to the StringBuilders where you would concat strings
left.append(toAppend);
right.append(toAppend);
}
//same as above for other methods
..
public void generateHTML() throws IOException {
String template = FileUtils.readFileToString(new File("difftemplate.html"), "utf-8");
//turn StringBuilders into Strings only when you actually need a String.
String out1 = template.replace("${left}", left.toString());
String output = out1.replace("${right}", right.toString());
FileUtils.write(new File("finalDiff.html"), output, "utf-8");
System.out.println("HTML diff generated.");
}
}
但是,如果這沒有幫助,并且它在運行時進行了優化-我認為您的操作方式沒有任何其他根本性的錯誤。比較大檔案并不是一項便宜的操作,它不會比您從硬碟驅動器中逐行讀取兩個檔案的速度更快。您仍然在創建一個快捷方式(提高速度,而不是降低速度),FileCommandsVisitor將兩個差異都保存在記憶體中,而不是隨時寫入,這意味著您的代碼最多可以區分大小等于可用檔案一半的檔案記憶體。但是我注意到,您從未提及實際需要多長時間,因此很難說您看到的時間是預期的還是例外的。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/413270.html
標籤:
