當輸入字串包含多個單詞的國家時拆分輸入字串-有解無憂

我得到多個國家作為輸入，我必須按空間分割。如果國家有多個單詞，則在“”之間宣告。例如

Chad Benin Angola Algeria Finland Romania "Democratic Republic of the Congo" Bolivia Uzbekistan Lesotho "United States of America"

目前我能夠逐字分割國家。因此，美利堅合眾國不會作為一個國家團結在一起。

    BufferedReader reader = new BufferedReader(
            new InputStreamReader(System.in));
    // Reading data using readLine
    String str = reader.readLine();
    ArrayList<String> sets = new ArrayList<String>();

    String[] newStr = str.split("[\\W]");
    boolean check = false;
    for (String s : newStr) {
        sets.add(s);
    }
    System.out.print(sets);

我怎樣才能分裂這些國家，這樣多字國家就不會分裂？

uj5u.com熱心網友回復：

與其匹配要拆分的內容，不如匹配國家名稱。您需要捕捉字母或引號之間的字母和空格。匹配 1 個或多個字母 - [a-zA-Z] , or( |) 匹配引號之間的字母和空格 - "[a-zA-Z\s] "。

    String input = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"";
    Pattern pattern = Pattern.compile("[a-zA-Z] |\"[a-zA-Z\\s] \"");
    Matcher matcher = pattern.matcher(input);
    while (matcher.find()) {
      String result = matcher.group();
      if (result.startsWith("\"")) {
        //quotes are matched, so remove them
        result = result.substring(1, result.length() - 1);
      }
      System.out.println(result);
    }

uj5u.com熱心網友回復：

嗯，可能是我不夠聰明，但是我沒有看到任何一行代碼的解決方案，但是我可以想到以下解決方案：

public static void main(String[] args) {
        String inputString = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"\n";

        List<String> resultCountriesList = new ArrayList<>();
        int currentIndex = 0;
        boolean processingMultiWordsCountry = false;
        for (int i = 0; i < inputString.length(); i  ) {
            Optional<String> substringAsOptional = extractNextSubstring(inputString, currentIndex);
            if (substringAsOptional.isPresent()) {
                String substring = substringAsOptional.get();
                currentIndex  = substring.length()   1;
                if (processingMultiWordsCountry) {
                    resultCountriesList.add(substring);
                } else {
                    resultCountriesList.addAll(Arrays.stream(substring.split(" ")).peek(String::trim).filter(s -> !s.isEmpty()).collect(Collectors.toList()));
                }
                processingMultiWordsCountry = !processingMultiWordsCountry;
            }
        }

        System.out.println(resultCountriesList);
    }

    private static Optional<String> extractNextSubstring(String inputString, int currentIndex) {
        if (inputString.length() > currentIndex   1) {
            return Optional.of(inputString.substring(currentIndex, inputString.indexOf("\"", currentIndex   1)));
        }
        return Optional.empty();
    }

國家/地區的結果串列（作為字串）位于resultCountriesList. 該代碼只是迭代字串，將原始字串inputString的子字串 - 從前一個子字串索引 -currentIndex帶到下一次出現的\"符號。如果子字串存在 - 我們繼續處理。此外，我們將由符號包圍的國家與位于布爾標志\"之外的國家隔離開來。\"processingMultiWordsCountry

所以，至少現在，我找不到更好的了。此外，我不認為這段代碼是理想的，我認為有很多可能的改進，所以如果你考慮任何 - 隨時添加評論。希望對您有所幫助，祝您有美好的一天！

uj5u.com熱心網友回復：

與接受的答案類似的方法，但使用更短的正則運算式并且沒有匹配和替換雙引號（在我看來，這是一個相當昂貴的程序）：

    String in = "Chad Benin Angola Algeria Finland Romania \"Democratic Republic of the Congo\" Bolivia Uzbekistan Lesotho \"United States of America\"";
    Pattern p = Pattern.compile("\"([^\"]*)\"|(\\w )");
    Matcher m = p.matcher(in);
    ArrayList<String> sets = new ArrayList<>();
    while(m.find()) {
        String multiWordCountry = m.group(1);
        if (multiWordCountry != null) {
            sets.add(multiWordCountry);
        } else {
            sets.add(m.group(2));
        }
    }
    System.out.print(sets);

結果：

[Chad, Benin, Angola, Algeria, Finland, Romania, Democratic Republic of the Congo, Bolivia, Uzbekistan, Lesotho, United States of America]

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/422249.html

標籤：

上一篇：如何用杰克遜序列化這個json？

下一篇：檢查物件是否與已知型別串列匹配