Maven將編碼更改為某些檔案-有解無憂

我所有的專案都使用 Cp1252 編碼，除了我用 UTF-8 編碼的幾個檔案，它們包含特殊字符。

當我運行安裝時，在這些檔案中出現幾個錯誤：unclosed character literal, illegal character: '\u00a8'. 使用 UTF8 編碼的插件進行安裝時：

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.3.2</version>
        <configuration>
            <source>1.8</source>
            <target>1.8</target>
            <encoding>UTF-8</encoding>
        </configuration>
    </plugin>

錯誤不再顯示在上述檔案中，但在許多其他檔案中，顯示的錯誤是： unmappable character for encoding UTF-8.

我可以只為某些檔案指定 UTF-8 編碼嗎？

另一件事，maven 顯示錯誤如下：

folder/file.java:[10,19] unclosed character literal
folder/file.java:[10,22] unclosed character literal
folder/file.java:[13,19] unclosed character literal

數字是什么意思？它似乎不是錯誤所在的行號。

uj5u.com熱心網友回復：

[10,19] 表示：第 10 行的第 19 個字符。

@VGR 準確解釋了為什么以 CP1252 格式讀取 UTF-8 編碼的源檔案會導致編譯失敗：任何非 ASCII 字符在 UTF-8 中至少被編碼為 2 個位元組。如果您隨后錯誤地將這些位元組讀取為 Cp1252，您將獲得 2 個或更多 gobbledygook 字符。鑒于 char 文字只允許在其中包含 1 個字符，因此代碼現在包含編譯器錯誤。

除非您運行單獨的編譯運行，否則無法告訴 maven 某些檔案是 UTF-8 并且某些檔案是 Cp1252，這很難做到，會非常混亂且難以維護（所以，一個壞主意），并且可以除非您涉及存根或者您“幸運”并且兩個批次中的一個是“自包含的”（絕對不包含對其他“批次”中的任何內容的參考），否則根本無法作業。

因此，讓我們擺脫它作為可行的選擇。這留下了2個選擇：

正確的選擇 - 一直都是 UTF-8

將所有源檔案視為 UTF-8。這比聽起來容易；所有 ASCII 字符在 UTF-8 和 Cp1252 中的編碼相同，因此只需要檢查非 ASCII 字符。這很容易找到：實際上，它是 126 以上的所有位元組。您可以使用許多工具來找到這些。例如，這是一個 SO question，其中包含有關如何在 linux 上執行此操作的答案。

使用任何可以明確使用哪種編碼的編輯器打開這些檔案（大多數開發人員編輯器都這樣做），重新加載編碼直到該字符看起來正確，然后重新保存為 UTF-8，瞧。所有沒有特殊字符的都是 UTF-8 和 Cp1252 同時 - 你可以簡單地使用 UTF-8 編碼編譯它們，它就可以正常作業。

現在你所有的代碼都在 UTF_8 中。相應地配置您的 IDE 專案/只需將您的 maven pom 保留為“它是 UTF-8”，所有支持 maven 的專案工具都會自動執行此操作。

相當糟糕的選擇 - 反斜杠-u轉義

如果您因為某些工具讀取這些源檔案而無法做到這一點（不是 maven 和 javac，實際上 Java 生態系統中幾乎沒有什么大不了的，因為 Java 生態系統都非常了解 UTF-8）并且堅持將其決議為cp1252，你無能為力：有一種方法可以從源檔案中洗掉所有非 ASCII：反斜杠-u 轉義。

這個概念\u0123在任何 java 檔案中的任何地方都是合法的，而不僅僅是在字串文字中。這意味著：具有該值的 unicode 字符（以十六進制表示）。例如，這個：

class Test {
  public static void main(String[] args) {
    //This does nothing, right? \u000aSystem.out.println("Hello!");
  }
}

When you run it, actually prints Hello!. Even though the sysout is in a comment... or is it?

\u000a is the newline symbol. So, the above file is parsed out as a comment on one line, then a newline, so, that System.out statement really is in there and isn't in a comment. Many tools don't know this (e.g. sublime text and co will render that sysout statement in commenty green), but javac and, in fact, the Java Lang Spec is crystal clear on this: The above code has a real print statement in there, not commented out.

Thus, you can go hunt for all non-ASCII and replace it with u escapes, and now your code is hybridized: It parses identically regardless of which encoding you use, as long as it's an ASCII compatible encoding, and almost all encodings are (only a few japanese and other east asian charsets, as well as UTF-16/UCS2/UCS4/UTF-32 style encodings, are non-ASCII compatible. Cp1252, Iso-8859, UTF_8 itself, ASCII itself, Cp850, and many many others are 'ASCII compatible', meaning, 100% ASCII text is identically encoded by all these encodings).

To turn things into u escapes, look up the hexadecimal value of the symbol in any unicode website and apply it. For example, é becomes \u00E9 (see é) and ? becomes \u2603 (see unicode snowman).

將這些轉義放在您在源檔案中看到非 ascii 的任何位置，即使您在字串文字之外看到它：

合法的Java：

public class Fighter {
  public void mêléeAttack() {}
}

但是..如果您將編輯器中的編碼設定和 maven 中的編碼設定混為一談，那會很糟糕。然而，這：

public class Fighter {
  public void m\u00EAl\u00E9eeAttack() {}
}

意思是一樣的，即使你弄亂了編碼也能正常作業。它在您的編輯器中看起來真的很糟糕，這就是為什么這是一個相當糟糕的選擇。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/422996.html

標籤：

上一篇：Eclipse-如何gitclone--depth1

下一篇：如何在本地IDE中從LeetCode運行代碼塊？