如何鏈接內部文本檔案交叉參考？-有解無憂

我正在制作一個帶有明文的檔案集合的網路版本，如下所示：

...如第 6 條所述，取決于...

我正在撰寫代碼來添加相對 URL 錨點（鏈接化）：

...as found in <a href="article_6">article 6</a>, depending on...

我對任何編程語言都持開放態度，目前擁有處理這種簡單情況的 Ruby regex 代碼：

    with_single_article_links = html.gsub(/(article \d )/i) do
      last_match = Regexp.last_match
      "<a href=\"last_match.gsub(' ', '_')\">#{last_match}</a>"
    end

但我正在尋找處理此類更復雜案例的想法，并多次參考：

...如第 6 條或第 7 條中所述，取決于...
...如第 6、7 或 8 條中所述，取決于...
……見第 6、7 或 8 條之二，取決于……

如果我繼續使用我當前的代碼，我可能會有兩個級別的正則運算式：第一次匹配article \d ，然后第二次檢查其中一個復雜的情況。

但是我可以采取其他方法嗎？我對任何編程語言和技術持開放態度。這基本上是對我的現實檢查，我正在使用一種體面的方法。

更新：擴展正則運算式，到目前為止這是有效的：

article (\d )((, \d )* or (\d ))?

實時取景：https ://regex101.com/r/WHtM5C/1

第二組只需要對逗號分隔串列進行一些簡單的決議。

uj5u.com熱心網友回復：

我添加了第二個答案，因為我不想在第一個答案被投票后做出任何重大改變。

正如您所注意到的，這是一種狀態機，因此您可以在第一次看到數字時開始“構建”一個數字，然后在您到達一個表示您已到達數字定義末尾的標記時完成該數字。如果數字構建變得復雜，您甚至可以啟動一個嵌套構建器，即 NumberBuilder 并將令牌發送到該構建器，直到您到達數字定義的末尾，然后向構建器詢問數字。

input = "as found in article 6 or 7, depending on\nas found in article 6, 7 bis or 8, depending on\nas found in article 6, 7 or 8 bis, depending on"

class TextReader
  attr_reader :builder, :text

  def initialize(text, builder)
    @text = text
    @builder = builder
  end

  def parse()
    stream = text.split(/(?=\s|,)/)
    stream.each do |token|
      case token
      when /^\s $/
        builder.convert_space(token)
      when /^\s*,$/, /^\s or$/
        builder.convert_joiner(token)
      when /^\s*\d $/
        builder.convert_digits(token)
      when /^\s*as$/
        builder.convert_as(token)
      when /^\s*found$/
        builder.convert_found(token)
      when /^\s*in$/
        builder.convert_in(token)
      when /^\s*article$/
        builder.convert_article(token)
      when /^\s*bis$/
        builder.convert_bis(token)
      else
        builder.convert_other(token)
      end
    end
  end
end

class HTMLBuilder
  attr_reader :html

  def initialize()
    @html = ""
  end

  def convert_space(token)
    html << token
  end

  def convert_joiner(token)
    @joiner = true
    process_number if @number
    html << token
  end

  def convert_other(token)
    process_number if @number
    @as = @found = @in = @article = @joiner = @number = false
    html << token
  end

  def convert_digits(token)
    @number = token   
  end

  def convert_bis(token)
    if @number 
        @number << token
        process_number
    else
        html << token
    end
  end

  def process_number()
    token = @number
    @number = false
    token =~ /^\s*(\d )(. )*/
    if @article
      if @joiner
        html << " <a href=\"article_#{$1}#{$2}\" #{$1}#{$2}>"
      else
        html << " <a href=\"article_#{$1}#{$2}\" article #{$1}#{$2}>"
      end
    else
      html << token
    end
  end

  def convert_as(token)
    @as = true
    html << token
  end

  def convert_found(token)
    @found = true if @as
    html << token
  end

  def convert_in(token)
    @in = true if @found
    html << token
  end

  def convert_article(token)
    @article = true if @in
  end
end

builder = HTMLBuilder.new
reader = TextReader.new(input, builder)
reader.parse
puts "output:"
puts builder.html

=>
output:
as found in <a href="article_6" 6> or <a href="article_7" 7>, depending on
as found in <a href="article_6" 6>, <a href="article_7 bis" 7 bis> or <a href="article_8" 8>, depending on
as found in <a href="article_6" 6>, <a href="article_7" 7> or <a href="article_8 bis" 8 bis>, depending on

uj5u.com熱心網友回復：

我知道這看起來有點矯枉過正而且非常冗長，但首先想到的是使用構建器模式，方法是將輸入拆分為標記，然后根據您在流中的位置轉換每個標記。

input = "as found in article 6 or 7, depending on\nas found in article 6, 7 or 8, depending on\nas found in article 6, 7 or 8 bis, depending on"

class TextReader
  attr_reader :builder, :text

  def initialize(text, builder)
    @text = text
    @builder = builder
  end

  def parse()
    stream = text.split(/(?=\s|,)/)
    stream.each do |token|
      case token
      when /^\s $/
        builder.convert_space(token)
      when /^\s*,$/, /^\s or$/
        builder.convert_joiner(token)
      when /^\s*\d $/
        builder.convert_number(token)
      when /^\s*as$/
        builder.convert_as(token)
      when /^\s*found$/
        builder.convert_found(token)
      when /^\s*in$/
        builder.convert_in(token)
      when /^\s*article$/
        builder.convert_article(token)
      else
        builder.convert_other(token)
      end
    end
  end
end

class HTMLBuilder
  attr_reader :html

  def initialize()
    @html = ""
  end

  def convert_space(token)
    html << token
  end

  def convert_joiner(token)
    @joiner = true
    html << token
  end

  def convert_other(token)
    @as = @found = @in = @article = @joiner = false
    html << token
  end

  def convert_number(token)
    token =~ /^\s*(\d )/
    if @article
      if @joiner
        html << " <a href=\"article_#{$1}\" #{$1}>"
      else
        html << " <a href=\"article_#{$1}\" article #{$1}>"
      end
    else
      html << token
    end
  end

  def convert_as(token)
    @as = true
    html << token
  end

  def convert_found(token)
    @found = true if @as
    html << token
  end

  def convert_in(token)
    @in = true if @found
    html << token
  end

  def convert_article(token)
    @article = true if @in
  end
end

builder = HTMLBuilder.new
reader = TextReader.new(input, builder)
reader.parse
puts "output:"
puts builder.html


=>
output:
as found in <a href="article_6" article 6> or <a href="article_7" 7>, depending on
as found in <a href="article_6" article 6>, <a href="article_7" 7> or <a href="article_8" 8>, depending on
as found in <a href="article_6" article 6>, <a href="article_7" 7> or <a href="article_8" 8> bis, depending on

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/473583.html

標籤：正则表达式红宝石解析

上一篇：從字串中提取所有帶小數的數字

下一篇：使用python從具有子方向的HTML中檢索資料