我有一個 PHP 腳本,它使用該preg_match_all函式從文本檔案中回傳所有匹配項。但是,我希望該函式只檢查每行中從位置 3 開始、長度為 11 位(基本上,結束位置為 13)的匹配項,而不是在整行中查找匹配項,因為這將回傳錯誤結果.
腳本:
<?php
$file = 'masterfile.out';
$searchfor = '02354098780';
// the following line prevents the browser from parsing this as HTML.
header('Content-Type: text/plain');
// get the file contents, assuming the file to be readable (and exist)
$contents = file_get_contents($file);
// escape special characters in the query
$pattern = preg_quote($searchfor, '/');
// finalise the regular expression, matching the whole line
$pattern = "/^.*$pattern.*\$/m";
// search, and store all matching occurrences in $matches
if(preg_match_all($pattern, $contents, $matches)){
echo "Found matches:\n";
echo substr(implode("\n", $matches[0]),2,11);
echo substr(implode("\n", $matches[0]),166,11);
}
else{
echo "No matches found";
}
?>
文本檔案示例資料:
I0023540987805R01 ABC GHI OLirrt 000000000000000100EA 0812160070451700 1098833 1990041300000001086000000000108600000000000996000000000032100000000000000000000000000000000000000000000000000000000000000000000006589000000000000000 P0012B
0000002032902R01 DEF JKL KLijuI 000000000000000100EA 0812160070451700 1029132 1997010800000002396000000000239600000120002326000000000000000000000000000000000000000000000000000000000000004560000000000000000000000000987600000000 A203SD
uj5u.com熱心網友回復:
對于少量字符,您可以將正則運算式錨定到行首:
'#^..([0-9]{13})#'
將搜索 13 位數字,忽略從行首 (^) 開始的前兩個字符 (.),包括第三個。
在這種情況下:
<?php
$file = 'masterfile.out';
// $pattern = '#^..([0-9]{11})#m'; // Any 11 digits
$pattern = '#^..(02354098780)#m'; // Exactly these 11
// the following line prevents the browser from parsing this
// as HTML.
header('Content-Type: text/plain');
// get the file contents, assuming the file to be readable (and exists)
$contents = file_get_contents($file);
if (preg_match_all($pattern, $contents, $matches, PREG_PATTERN_ORDER)){
echo "Found matches:\n";
echo implode("\n", $matches[1]);
echo "\n";
} else {
echo "No matches found\n";
}
更新
我剛剛注意到您的序列從1開始的第三個字符開始。在某些標準(以及我早期的示例中)中,您從 0 開始計數。因此,如果您從 1 開始,則只需要兩個點,而不是三個。換句話說,當您說“從位置 3 開始”時,您的意思可能是跳過前兩個字符,而 - 正如您從其他答案中看到的那樣 - 幾乎每個人都認為您想跳過三個字符。
uj5u.com熱心網友回復:
如果您的示例接近您的預期用途,您實際上是在搜索子字串的精確匹配,但使用 preg_match_all。但是,遍歷行應該具有較低的記憶體影響,并且嚴格的子字串比較對于完全相等具有比 preg_match_all 更低的 cpu 影響。
所以我建議這樣做。這可以通過fgets或來實作stream_get_line,這可能會稍微提高性能(盡管在大多數情況下這無關緊要)。
這可以通過以下方式實作:
$searchString = 'someFixedString';
$posOffset = 2;
$matchLength = mb_strlen($searchString);
$filePath = '/some/file.path';
$fileHandle = @fopen($filePath, 'r ');
$checkedLines = 0;
$matches = [];
$foundMatches = false;
//Depending on what you wish to output
$capturePosOffset = 0;
$captureLength = $matchLength $posOffset 3;
// if lines are no longer than 8192 bytes,
// otherwise set to a value above the byte-length of your lines
$maxBytesToReadPerLine = 0;
// if file line-terminator is as in PHP,
// otherwise set to file's line-terminator
$lineTerminator = PHP_EOL;
if ($fileHandle) {
while (!feof($fileHandle)) {
$checkedLines ;
// or just use fgets, which requires no further arguments
$line = stream_get_line($fileHandle, $maxBytesToReadPerLine, $lineTerminator);
if (mb_substr($line, $posOffset, $matchLength) === $searchString) {
$foundMatches = true;
$matches[] = $line;
// Or, if you want to capture a field with a fixed Length
// (modify the offset and length arguments above)
$matches[] = mb_substr($line, $capturePosOffset, $captureLength);
}
}
}
if ($foundMatches) {
echo "Found " . count($matches) . " matches among $checkedLines lines:" . PHP_EOL;
foreach ($matches as $matchedValue) {
// I'm not sure what you intend to do here.
// - In your example code, it appears you
// implode the array, but then only output
// 11 characters of the first line starting at position 3.
// If you want the whole line, you can capture it above
// and echo it here.
// Or if you want, you can capture and output the first field
// by modifying $capturePosOffset and $captureLength
// by merely echoing the value (and a newline)
echo ' ' . $matchedValue . PHP_EOL;
}
} else {
echo "No matches found!" . PHP_EOL;
}
我們使用mb_strlen和mb_substr的情況下,編碼允許多位元組字符-只有當你知道這是絕對不會的情況下可以strlen和substr安全使用。
人們不應該陷入過早優化的困境,但請注意:哪種解決方案最佳將在很大程度上取決于檔案大小和匹配長度。
uj5u.com熱心網友回復:
下面的正則運算式忽略每行開頭的前 3 個字符并捕獲后面的 11 個字符
https://regex101.com/r/MEaB67/1
/^.{3}(.{11})/gm
編輯
下面是一些示例 PHP 代碼來測驗正則運算式
<pre>
<?php
$pattern = '/^.{3}(.{11})/m';
$subject = '
I0023540987805R01 ABC GHI OLirrt 000000000000000100EA 0812160070451700 1098833 1990041300000001086000000000108600000000000996000000000032100000000000000000000000000000000000000000000000000000000000000000000006589000000000000000 P0012B
0000002032902R01 DEF JKL KLijuI 000000000000000100EA 0812160070451700 1029132 1997010800000002396000000000239600000120002326000000000000000000000000000000000000000000000000000000000000004560000000000000000000000000987600000000 A203SD
';
$matches = null;
preg_match_all($pattern, $subject, $matches);
var_dump($matches);
?>
</pre>
法比奧
uj5u.com熱心網友回復:
這是一種與您不同的方法 - 由于我們正在該行的特定部分查找字串,因此我們可以洗掉其余部分并檢查該字串是否出現在所述行中。
<?php
$text = "I0023540987805R01 ABC GHI OLirrt 000000000000000100EA 0812160070451700 1098833 1990041300000001086000000000108600000000000996000000000032100000000000000000000000000000000000000000000000000000000000000000000006589000000000000000 P0012B
0000002032902R01 DEF JKL KLijuI 000000000000000100EA 0812160070451700 1029132 1997010800000002396000000000239600000120002326000000000000000000000000000000000000000000000000000000000000004560000000000000000000000000987600000000 A203SD ";
echo '<pre>';
$txt = explode("\n",$text);
echo '<pre>';
print_r($txt);
foreach($txt as $key => $line){
$subbedString = substr($line,2,11);
$searchfor = '02354098780';
//echo strpos($subbedString,$searchfor);
if(strpos($subbedString,$searchfor) === 0){
$matches[$key] = $searchfor;
$matchesLine[$key] = $line; /**Save the whole line when match is found. */
echo "Found in line : $key";
}
}
echo '<pre>';
print_r($matches);
echo '<pre>';
print_r($matchesLine);
將回傳:
Array
(
[0] => I0023540987805R01 ABC GHI OLirrt 000000000000000100EA 0812160070451700 1098833 1990041300000001086000000000108600000000000996000000000032100000000000000000000000000000000000000000000000000000000000000000000006589000000000000000 P0012B
[1] => 0000002032902R01 DEF JKL KLijuI 000000000000000100EA 0812160070451700 1029132 1997010800000002396000000000239600000120002326000000000000000000000000000000000000000000000000000000000000004560000000000000000000000000987600000000 A203SD
)
Found in line : 0
Array
(
[0] => 02354098780
)
Array
(
[0] => I0023540987805R01 ABC GHI OLirrt 000000000000000100EA 0812160070451700 1098833 1990041300000001086000000000108600000000000996000000000032100000000000000000000000000000000000000000000000000000000000000000000006589000000000000000 P0012B
)
uj5u.com熱心網友回復:
您可以匹配 3 個字符,然后使用\K忘記到目前為止匹配的內容,然后匹配 11 個數字。
^...\K\d{11}
^字串的開始...匹配除換行符以外的任何字符的 3 次\K清除當前匹配緩沖區\d{11}匹配 11 位數字
您可以省略 usingpreg_quote因為在當前模式中沒有什么可以轉義的。
由于模式使用錨點,^您必須指定多行標志/m才能獲得所有結果。
$file = 'masterfile.out';
$contents = file_get_contents($file);
$pattern = "/^...\K\d{11}/m";
if (preg_match_all($pattern, $contents, $matches)) {
echo "Found matches:" . PHP_EOL;
foreach ($matches[0] as $m) {
echo $m . PHP_EOL;
}
} else {
echo "No matches found";
}
輸出
Found matches:
23540987805
00002032902
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/374950.html
上一篇:上傳時運行PHP檔案
