目的是從我們不知道它是否具有變音符號但必須保留剩余字串(如果有)的任何和所有變音符號的阿拉伯字串中找到并洗掉起始字串/字符/單詞。
在 StackOverflow 上從英文字串中洗掉第一個/起始字串/字符有很多答案,但是在 StackOverflow 上沒有找到解決此問題的現有解決方案,可以保持阿拉伯字串在其原始形式中的平衡。
如果原始字串在處理之前被規范化(去除變音符號、tanween 等),那么回傳的剩余字串將是規范化字串的余額,而不是原始字串的剩余部分。
例子。假設以下原始字串可以是以下任何一種形式(即相同的字串但不同的變音符號):
1. “?????? ????? ????? ????”
2. “??????? ?????? ?????? ????”
3. “???????? ??????? ??????? ????”
4. “?????????? ?????????? ?????????? ????”
現在假設我們要洗掉第一個/起始字符“??????”,僅當字串以此類字符開頭時(確實如此),并回傳“原始”字串的剩余部分及其原始變音符號。
當然,我們正在尋找沒有變音符號的字符“??????”,因為我們不知道原始字串是如何使用變音符號格式化的。
因此,在這種情況下,回傳的每個字串的剩余部分必須是:
1. "????? ????? ????"
2. "?????? ?????? ????"
3. "??????? ??????? ????"
4. "?????????? ?????????? ????"
以下代碼適用于英文字串(還有許多其他解決方案),但不適用于上面解釋的阿拉伯字串。
function removeStartWord(string,word) {
if (string.startsWith(word)) string=string.slice(word.length);
return string;
}
上述代碼采用了根據字符長度對從原始字串中找到的起始字符進行切片的原理;這適用于英文文本。
對于阿拉伯字串,我們不知道原始字串的變音符號的形式,因此我們在原始字串中尋找的字串/字符的長度將是不同的且未知的。
編輯:添加示例影像以獲得更好的說明。
以下影像表提供了更多示例:

uj5u.com熱心網友回復:
為了跟蹤討論,我正在添加一個新答案,請試試這個!
function removeStartWord(string, word) {
const alphabeticString = string.replace(/[^a-zA-Z?-?0-9/] /g, '');
if(!alphabeticString.startsWith(word)) return string;
const letters = [...word];
let cleanString = '';
string.split('').forEach((_letter) => {
if(letters.indexOf(_letter) > -1) {
delete letters[letters.indexOf(_letter)]
}else{
cleanString = _letter;
}
});
return cleanString.replace(/[^a-zA-Z?-?0-9/\s]*/i, '');
}
const sampleData = `?????????? ?????????? ?????????? ????`;
console.log('sampleData ...', sampleData);
console.log(
"removeStartWord(sampleData, '??????') ...",
removeStartWord(sampleData, '??????')
);
console.log(
"removeStartWord(sampleData, '???') ...",
removeStartWord(sampleData, '???')
);
console.log(
"removeStartWord(sampleData, '?????? ') ...",
removeStartWord(sampleData, '?????? ')
);
console.log(
"removeStartWord(sampleData, ' ??????') ...",
removeStartWord(sampleData, ' ??????')
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
uj5u.com熱心網友回復:
我提出了以下可能的解決方案。
以下解決方案分為兩部分;首先,該函式startsWithAr()用于“部分”模仿 javascriptstartsWith()方法,但用于阿拉伯字串。
但是,它不會回傳'true'or 'false',而是回傳index after the characters we are looking for源字串開頭的 (即在源字串中找到的字串的長度,包括其 Tashkeel(變音符號),如果有的話),否則,-1如果指定字串的字符則回傳在字串的開頭找不到。
使用該startsWithAr()函式,然后我們創建(在第2 部分中)一個函式,該函式使用該slice()方法洗掉在源字串開頭找到的指定字串的字符;該removeStartString()功能。
這種方法不僅允許保留源字串其余部分的 Tashkeel(變音符號),還允許搜索和洗掉帶有 Tahmeez 的字串。
該函式忽略Tashkeel(變音符號)和Tahmeez在兩個源字串和查找搜索字串,并從一開始洗掉指定的字符盯著之后將回傳源字串,其原有的Tashkeel(變音符號)的完整的剩余部分源字串。
這樣我們就可以使用該函式來處理阿拉伯語腳本中的所有 Unicode,而不是將其限制在定義的范圍內,因為任何語言的任何其他字符都將被忽略。
我們還可以通過將“?”與“?”匹配來輕松改進它,這樣我們就可以洗掉字串“??????”,即使它被寫為“??????”,也可以通過添加.replace(/[?]/g,'?')兩.replace()行來洗掉。
我在下面包含了關于startsWithAr()函式和removeStartString()函式使用的單獨測驗用例。
如果需要,這兩個功能可以組合成一個功能。
請根據需要改進;任何建議表示贊賞。
第一部分:startsWithAr()
//=====================================================================
// startsWithAr() function
// Purpose:
// Determines whether an Arabic string (the "Source String") begins with the characters
// of a specified string (the "Look-For String").
// Return the position (index) after the Look-For String if found, else return -1 if not found.
// Ignores Tashkeel (diacritics) and Tahmeez in both the Source and Look-For Strings.
// The returned position index is zero based.
// By knowing the position (index) after the Look-For String, one can remove the
// starting string using the slice() method while maintaining the remainder of the Source String with
// its original tashkeel (diacritics) unchanged.
//
// Parameters:
// str : The Source String to search in.
// lookFor : The characters to be searched for at the start of this string.
//=====================================================================
function startsWithAr(str,lookFor) {
let indexLookFor=0, tshk=/[?-??-??-??-????]/, w=/[?]/g,hamz=/[???????]/g;
lookFor=lookFor.replace(hamz,'?').replace(w,'?').replace(/[?-??-??-??-????]/g,''); // normalize the lookFor string
for (let indexStr=0; indexStr<str.length;indexStr ) {
while(tshk.test(str[indexStr])&&indexStr<str.length) indexStr; // skip tashkeel & increase index
if (lookFor[indexLookFor]!==str[indexStr].replace(hamz,'?').replace(w,'?')) return-1; // no match, so exit -1
indexLookFor ; // match found so next char in lookFor String
if (indexLookFor>=lookFor.length) { // if end of Source String then WE FOUND IT
indexStr =1; // point after source char
while(tshk.test(str[indexStr])&&indexStr<str.length) indexStr; // skip tashkeel after Source String if any
return indexStr; // return index in Source String after lookFor string and after any tashkeel
}
}
return-1; // not found end of string reached
}
//=========================================
// test cases for startsWithAr() function
//=========================================
var r =0; // test tracking flag
r |= test("?????? ?????????? ????? ????","??????",6); // find the start letters '??????'
r |= test("?????????? ?????????? ????? ????","??????",10); // find the start letters '??????'
r |= test("?????????? ?????????? ?????????? ????","????????",10); // find the start letters '????????'
r |= test("?????????? ?????????? ?????????? ????","????????",10); // find the start letters '????????'
r |= test("?????? ?? ??????","??????",6); // find the start letters '??????'
r |= test("?????/???","?????",5); // find the start letters '?????'
r |= test("?????/???","?",-1); // find the start letters '?????'
r |= test(" ?????"," ",1); // find the start letter ' ' (space)
r |= test("????? ???","??",2); // find the start letters '??'
r |= test("????? ???","?",1); // find the start letter '?'
r |= test("????? ???","??",2); // find the start letters '??'
r |= test("????? ???","??",2); // find the start letters '??'
r |= test("????? ???","???",2); // find the start letters '???'
r |= test("??????? ????","???",3); // find the start letters '???'
r |= test("","?",-1); // empty Source String
r |= test("","",-1); // empty Source String and Look-For String
if (r==0) console.log("? All startsWithAr() test cases passed");
//-----------------------------------
function test(str,lookfor,should) {
let result= startsWithAr(str,lookfor);
if (result !== should) {console.log(`
${str} Output :${result}
${str} Should be:${should}
`);return 1;}
}
第二部分:removeStartString()
//=====================================================================
// removeStartString() function
// Purpose:
// Determines whether an Arabic string (the "Source String") begins with the characters
// of a specified string (the "Look-For String").
// If found the Look-For String is removed and the reminder of the Source String is returned
// with its original Tashkeel (diacritics);
// If no match then return original Source String.
//
// Ignores Tashkeel (diacritics) and Tahmeez in both the Source and Look-For Strings.
// The function uses the startsWithAr() function to determine the index after the matched
// starting string/characters.
//
// Parameters:
// str : The Source String to search in.
// toRemove: The characters to be searched for and removed if at the start of this string.
//=====================================================================
function removeStartString(str,toRemove) {
let index=startsWithAr(str,toRemove);
if (index>-1) str=str.slice(index);
return str;
}
//=========================================
// test cases for removeStartString() function
//=========================================
var r =0; // test tracking flag
r |= test2("?????? ?????????? ????? ????","??????"," ?????????? ????? ????"); // remove the start letters '??????'
r |= test2("?????????? ?????????? ????? ????","??????"," ?????????? ????? ????"); // remove the start letters '??????????'
r |= test2("?????? ?????????? ????? ????","??????????"," ?????????? ????? ????"); // remove the start letters '??????????'
r |= test2(" ?????? ?????????? ????? ????"," ??????????"," ?????????? ????? ????");// remove the start letters '?????????? '
r |= test2("?????? ?????????? ????? ????","??","???? ?????????? ????? ????"); // remove the start letters '??'
r |= test2("??????? ????????","?","????? ????????"); // remove the start letter '?' r |= test2("??????? ????????"," ","??????? ????????"); // remove the start letter ' '
r |= test2("??????? ????????","","??????? ????????"); // remove the start letter ''
r |= test2("??????? ????????","???","??????? ????????"); // remove the start letters '???'
if (r==0) console.log("? All removeStartString() test cases passed");
//-----------------------------------
function startsWithAr(str,lookFor) {
let indexLookFor=0, tshk=/[?-??-??-??-????]/, w=/[?]/g,hamz=/[???????]/g;
lookFor=lookFor.replace(hamz,'?').replace(w,'?').replace(/[?-??-??-??-????]/g,'');
for (let indexStr=0; indexStr<str.length;indexStr ) {
while(tshk.test(str[indexStr])&&indexStr<str.length) indexStr;
if (lookFor[indexLookFor]!==str[indexStr].replace(hamz,'?').replace(w,'?')) return-1;
indexLookFor ;
if (indexLookFor>=lookFor.length) {
indexStr =1;
while(tshk.test(str[indexStr])&&indexStr<str.length) indexStr;
return indexStr;
}
}
return-1;
}
//-----------------------------------
function test2(str,toRemove,should) {
let result= removeStartString(str,toRemove);
if (result !== should) {console.log(`
${str} Output :${result}
${str} Should be:${should}
`);return 1;}
}
uj5u.com熱心網友回復:
盡管JavaScript不支持. _\p{Arabic}
一個基于類別的模式已經完全符合 OP 的要求/^[\p{L}\p{M}] \p{Z} /gmu......replace
從具有變音符號的阿拉伯字串中查找并洗掉第一個起始詞
模式……^[\p{L}\p{M}] \p{Z} 讀起來是這樣的……
^...從新行的開頭開始...[ ... ]...在串列中查找指定字符類的第一個字符...\p{L}...L來自任何語言的任何型別的etter,\p{M}...或旨在與另一個字符組合的字符(例如重音符號、變音符號、封閉框等)
- ... 后跟
\p{Z}... 任何型別的空格或不可見分隔符中的至少一個。
console.log(`?????? ????? ????? ????
??????? ?????? ?????? ????
???????? ??????? ??????? ????
?????????? ?????????? ?????????? ????`.replace(/^[\p{L}\p{M}] \p{Z} /gmu, ''));
.as-console-wrapper { min-height: 100%!important; top: 0; }
編輯
因為它現在很清楚什么OP真正想要的,上述方法遺骸和剛剛被通過利用提升到一個新的水平replacer的功能與基于一個額外的比較邏輯Intl.Collator物件,需要阿拉伯語和基本字母比較進去。
通過提供(除了'ar'本地人)一個具有基本敏感性的選項,整理者被初始化為最不嚴格的。因此,在通過整理者的compare方法比較兩個相似(但不完全相等)的字串時,例如'??????','??????????'將被認為是相等的,盡管后者具有(很多)變音符號。
證明/例子...
const baseLetterCollator = new Intl.Collator('ar', { sensitivity: 'base' } );
console.log(
"('?????? ????? ????? ????' === '?????????? ?????????? ?????????? ????') ?..",
('?????? ????? ????? ????' === '?????????? ?????????? ?????????? ????')
);
console.log('\n');
console.log(`new Intl.Collator()
.compare('?????? ????? ????? ????' ,'?????????? ?????????? ?????????? ????') === 0
?..`,
new Intl.Collator()
.compare('?????? ????? ????? ????' ,'?????????? ?????????? ?????????? ????') === 0
);
console.log(`new Intl.Collator('ar', { sensitivity: 'base' } )
.compare('?????? ????? ????? ????' ,'?????????? ?????????? ?????????? ????') === 0
?..`,
new Intl.Collator('ar', { sensitivity: 'base' } )
.compare('?????? ????? ????? ????' ,'?????????? ?????????? ?????????? ????') === 0
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
基于以上所述...最終解決方案...
function removeFirstMatchingWordFromEveryNewLine(search, multilineString) {
const baseLetterCollator
// - [ar]abic
// - base sensitivity
// ... only strings that differ in base letters compare as unequal.
= new Intl.Collator('ar', { sensitivity: 'base' } );
const replacer = word => {
return (baseLetterCollator.compare(search, word.trim()) === 0)
? '' // - remove the matching word (whitespace included).
: word; // - keep the word since there was no match.
}
const regXFirstLineWord = /^[\p{L}\p{M}] \p{Z} /gmu;
search = String(search).trim();
return String(multilineString).replace(regXFirstLineWord, replacer);
}
const sampleData = `?????? ????? ????? ????
??????? ?????? ?????? ????
???? ??????
???????? ??????? ??????? ????
?????????? ?????????? ?????????? ????`;
console.log('sampleData ...', sampleData);
console.log(
"removeFirstMatchingWordFromEveryNewLine('??????', sampleData) ...",
removeFirstMatchingWordFromEveryNewLine('??????', sampleData)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
uj5u.com熱心網友回復:
由于需求變化(d)和資訊逐片進入,...
“[...] 答案洗掉了第一個匹配的單詞,假設單詞后面有一個空格。但是我們正在尋找的字串可能不一定跟著空格(即不是一個獨立的單詞)。例如,洗掉句子“?????/???? ???????”中的字符“?????”,僅回傳“/???? ???????”。 – Mohsen Alyafei”
...我也將從一張白紙開始。
針對 Unicode 屬性的匹配結果進行Intl.Collator基于語言環境的組合方法轉義基于正則運算式,該正則運算式匹配任何阿拉伯語單詞,而不管重音、變音符號等組合字符如何,如果涉及查找/匹配任何型別的字串(這里是新行的開頭)。compare
但是,任何試圖天真地迭代字串以逐個字符地將兩個字串與另一個字串進行比較的方法都將失敗。
示例代碼比文字更能說明問題......讓我們看看......
console.log(`
... remember ...
new Intl.Collator('ar', { sensitivity: 'base' } )
.compare('??????????' ,'??????') === 0
?..`, new Intl.Collator('ar', { sensitivity: 'base' } )
.compare('??????????' ,'??????') === 0, `
... but ...
new Intl.Collator('ar')
.compare('??????????' ,'??????') === 0
?..`, new Intl.Collator('ar')
.compare('??????????' ,'??????') === 0
);
console.log('\n... explanation ...\n\n');
console.log("'??????'.length ...", '??????'.length);
console.log("'??????????'.length ...", '??????????'.length);
console.log("'??????'.split('') ...", '??????'.split(''));
console.log("'??????????'.split('') ...", '??????????'.split(''));
.as-console-wrapper { min-height: 100%!important; top: 0; }
幸運的是Intl,ECMAScript 的國際化 API 也可以在這里提供幫助。有Intl.Segmenter利于打破字串(S)為可比段。對于 OP 的用例,在默認granularity級別'grapheme'上執行它就足夠了,這似乎等于將區域設定為可比較字母的分段......
console.log(`[
...new Intl.Segmenter('ar', { granularity: 'grapheme' }).segment('??????')
]
.map(({ segment }) => segment) ...`, [
...new Intl.Segmenter('ar', { granularity: 'grapheme' }).segment('??????')
]
.map(({ segment }) => segment)
);
console.log(`[
...new Intl.Segmenter('ar').segment('??????????')
]
.map(({ segment }) => segment) ...`, [
...new Intl.Segmenter('ar').segment('??????????')
]
.map(({ segment }) => segment)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
因此,最后一步是通過將上面介紹的內容Intl.Segmenter與現在已經熟悉的Intl.Collator...
function removeEveryMatchingNewLineStart(search, multilineString) {
const letterSegmenter
// - [ar]abic
// - default grapheme granularity (locale comparable letters).
= new Intl.Segmenter('ar'/*, { granularity: 'grapheme' }*/);
const letterCollator
// - [ar]abic
// - base sensitivity
// ... Non-zero comparator result value for strings only
// that for a base letter comparison are considered unequal.
= new Intl.Collator('ar', { sensitivity: 'base' } );
const getLocaleComparableLetterList = str =>
[...letterSegmenter.segment(str)].map(({ segment }) => segment);
function replaceLineStartByBoundComparableLetters(line) {
const searchLetters = this;
let lineLetters = getLocaleComparableLetterList(line);
if (searchLetters.every((searchLetter, idx/*, arr*/) =>
(letterCollator.compare(searchLetter, lineLetters[idx]) === 0)
)) {
lineLetters = lineLetters.slice(searchLetters.length);
let leadingBlanks = '';
while (lineLetters[0] === ' ') {
leadingBlanks = leadingBlanks lineLetters.shift();
}
line = `${ lineLetters.join('') }${ leadingBlanks }`;
// // due to keeping/restoring leading witespace sequences ...
// // ... all the above additional computation instead of ...
// // ... a simple ...
// line = lineLetters.slice(searchLetters.length).join('')
}
return line;
}
return String(multilineString)
.split(/(\n)/)
.map(
replaceLineStartByBoundComparableLetters.bind(
getLocaleComparableLetterList(String(search))
)
)
.join('');
}
const sampleData = `?????? ????? ????? ????
??????? ?????? ?????? ????
???? ??????
???????? ??????? ??????? ????
?????????? ?????????? ?????????? ????`;
console.log('sampleData ...', sampleData);
console.log(
"removeEveryMatchingNewLineStart('??????', sampleData) ...",
removeEveryMatchingNewLineStart('??????', sampleData)
);
console.log(
"removeEveryMatchingNewLineStart('???', sampleData) ...",
removeEveryMatchingNewLineStart('???', sampleData)
);
console.log(
"removeEveryMatchingNewLineStart('?????? ', sampleData) ...",
removeEveryMatchingNewLineStart('?????? ', sampleData)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
uj5u.com熱心網友回復:
我看不出您的代碼有什么問題,但這是另一種方法:
function removeStartWord(string, word) {
return string.split(' ').filter((_word, index) => index !== 0 || _word.replace(/[^a-zA-Z?-?] /g, '') !== word).join(' ');
}
const sampleData = `?????????? ?????????? ?????????? ????`;
console.log('sampleData ...', sampleData);
console.log(
"removeStartWord(sampleData, '??????') ...",
removeStartWord(sampleData, '??????')
);
console.log(
"removeStartWord(sampleData, '???') ...",
removeStartWord(sampleData, '???')
);
console.log(
"removeStartWord(sampleData, '?????? ') ...",
removeStartWord(sampleData, '?????? ')
);
console.log(
"removeStartWord(sampleData, ' ??????') ...",
removeStartWord(sampleData, ' ??????')
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/420688.html
標籤:
上一篇:除了.htaccess的特定檔案夾外,無權訪問所有檔案夾
下一篇:IntelliJ不會自動換行
