我正在制作一個帶有明文的檔案集合的網路版本,如下所示:
...如第 6 條所述,取決于...
我正在撰寫代碼來添加相對 URL 錨點(鏈接化):
...as found in <a href="article_6">article 6</a>, depending on...
我對任何編程語言都持開放態度,目前擁有處理這種簡單情況的 Ruby regex 代碼:
with_single_article_links = html.gsub(/(article \d )/i) do
last_match = Regexp.last_match
"<a href=\"last_match.gsub(' ', '_')\">#{last_match}</a>"
end
但我正在尋找處理此類更復雜案例的想法,并多次參考:
- ...如第 6 條或第 7 條中所述,取決于...
- ...如第 6、7 或 8 條中所述,取決于...
- ……見第 6、7 或 8 條之二,取決于……
如果我繼續使用我當前的代碼,我可能會有兩個級別的正則運算式:第一次匹配article \d ,然后第二次檢查其中一個復雜的情況。
但是我可以采取其他方法嗎?我對任何編程語言和技術持開放態度。這基本上是對我的現實檢查,我正在使用一種體面的方法。
更新:擴展正則運算式,到目前為止這是有效的:
article (\d )((, \d )* or (\d ))?
實時取景:https ://regex101.com/r/WHtM5C/1
第二組只需要對逗號分隔串列進行一些簡單的決議。
uj5u.com熱心網友回復:
我添加了第二個答案,因為我不想在第一個答案被投票后做出任何重大改變。
正如您所注意到的,這是一種狀態機,因此您可以在第一次看到數字時開始“構建”一個數字,然后在您到達一個表示您已到達數字定義末尾的標記時完成該數字。如果數字構建變得復雜,您甚至可以啟動一個嵌套構建器,即 NumberBuilder 并將令牌發送到該構建器,直到您到達數字定義的末尾,然后向構建器詢問數字。
input = "as found in article 6 or 7, depending on\nas found in article 6, 7 bis or 8, depending on\nas found in article 6, 7 or 8 bis, depending on"
class TextReader
attr_reader :builder, :text
def initialize(text, builder)
@text = text
@builder = builder
end
def parse()
stream = text.split(/(?=\s|,)/)
stream.each do |token|
case token
when /^\s $/
builder.convert_space(token)
when /^\s*,$/, /^\s or$/
builder.convert_joiner(token)
when /^\s*\d $/
builder.convert_digits(token)
when /^\s*as$/
builder.convert_as(token)
when /^\s*found$/
builder.convert_found(token)
when /^\s*in$/
builder.convert_in(token)
when /^\s*article$/
builder.convert_article(token)
when /^\s*bis$/
builder.convert_bis(token)
else
builder.convert_other(token)
end
end
end
end
class HTMLBuilder
attr_reader :html
def initialize()
@html = ""
end
def convert_space(token)
html << token
end
def convert_joiner(token)
@joiner = true
process_number if @number
html << token
end
def convert_other(token)
process_number if @number
@as = @found = @in = @article = @joiner = @number = false
html << token
end
def convert_digits(token)
@number = token
end
def convert_bis(token)
if @number
@number << token
process_number
else
html << token
end
end
def process_number()
token = @number
@number = false
token =~ /^\s*(\d )(. )*/
if @article
if @joiner
html << " <a href=\"article_#{$1}#{$2}\" #{$1}#{$2}>"
else
html << " <a href=\"article_#{$1}#{$2}\" article #{$1}#{$2}>"
end
else
html << token
end
end
def convert_as(token)
@as = true
html << token
end
def convert_found(token)
@found = true if @as
html << token
end
def convert_in(token)
@in = true if @found
html << token
end
def convert_article(token)
@article = true if @in
end
end
builder = HTMLBuilder.new
reader = TextReader.new(input, builder)
reader.parse
puts "output:"
puts builder.html
=>
output:
as found in <a href="article_6" 6> or <a href="article_7" 7>, depending on
as found in <a href="article_6" 6>, <a href="article_7 bis" 7 bis> or <a href="article_8" 8>, depending on
as found in <a href="article_6" 6>, <a href="article_7" 7> or <a href="article_8 bis" 8 bis>, depending on
uj5u.com熱心網友回復:
我知道這看起來有點矯枉過正而且非常冗長,但首先想到的是使用構建器模式,方法是將輸入拆分為標記,然后根據您在流中的位置轉換每個標記。
input = "as found in article 6 or 7, depending on\nas found in article 6, 7 or 8, depending on\nas found in article 6, 7 or 8 bis, depending on"
class TextReader
attr_reader :builder, :text
def initialize(text, builder)
@text = text
@builder = builder
end
def parse()
stream = text.split(/(?=\s|,)/)
stream.each do |token|
case token
when /^\s $/
builder.convert_space(token)
when /^\s*,$/, /^\s or$/
builder.convert_joiner(token)
when /^\s*\d $/
builder.convert_number(token)
when /^\s*as$/
builder.convert_as(token)
when /^\s*found$/
builder.convert_found(token)
when /^\s*in$/
builder.convert_in(token)
when /^\s*article$/
builder.convert_article(token)
else
builder.convert_other(token)
end
end
end
end
class HTMLBuilder
attr_reader :html
def initialize()
@html = ""
end
def convert_space(token)
html << token
end
def convert_joiner(token)
@joiner = true
html << token
end
def convert_other(token)
@as = @found = @in = @article = @joiner = false
html << token
end
def convert_number(token)
token =~ /^\s*(\d )/
if @article
if @joiner
html << " <a href=\"article_#{$1}\" #{$1}>"
else
html << " <a href=\"article_#{$1}\" article #{$1}>"
end
else
html << token
end
end
def convert_as(token)
@as = true
html << token
end
def convert_found(token)
@found = true if @as
html << token
end
def convert_in(token)
@in = true if @found
html << token
end
def convert_article(token)
@article = true if @in
end
end
builder = HTMLBuilder.new
reader = TextReader.new(input, builder)
reader.parse
puts "output:"
puts builder.html
=>
output:
as found in <a href="article_6" article 6> or <a href="article_7" 7>, depending on
as found in <a href="article_6" article 6>, <a href="article_7" 7> or <a href="article_8" 8>, depending on
as found in <a href="article_6" article 6>, <a href="article_7" 7> or <a href="article_8" 8> bis, depending on
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/473583.html
上一篇:從字串中提取所有帶小數的數字
