對不起,如果標題有點混亂,我不知道如何用幾個詞來表達它。
我目前正在處理用戶上傳 .csv 或 excel 檔案的情況,并且必須正確映射資料以準備批量上傳。當您閱讀下面的代碼時,它會更有意義!
第一步:用戶上傳 .csv/excel 檔案,將其轉換為物件陣列。通常,第一個陣列將是標題。
資料將如下所示(包括標題)。這將是 100 項到最多約 100,000 項之間的任何地方:
const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
]
上傳后,用戶會將每個欄位映射到正確的模式。這可以是所有欄位,也可以是少數幾個欄位。
例如,用戶只想排除除郵政編碼之外的地址部分。我們將取回“映射欄位”陣列,重命名為正確的模式名稱(即 First Name => firstName):
const MAPPED_FIELDS = [firstName, lastName, company, email, phone, <empty>, <empty>, <empty>, zipCode]
我已經做到了,因此映射欄位的索引將始終與“標題”匹配。所以任何未映射的標頭都會有一個值。
所以在這種情況下,我們知道只上傳索引為 [0, 1, 2, 3, 4, 8] 的資料(DUMMY_DATA)。
然后我們進入最后一部分,我們要為所有資料上傳正確的欄位,因此我們將擁有來自 MAPPED_FIELDS 的正確映射模式與來自 DUMMY_DATA 的映射值匹配......
const firstObjectToBeUploaded = {
firstName: 'Lambert',
lastName: 'BeckHouse',
company: 'StackOverflow',
email: '[email protected]',
phone: '512-555-1738',
zipCode: '78721'
}
try {
await uploadData(firstObjectToBeUploaded)
} catch (err) {
console.log(err)
}
所有資料都將發送到用 Node.js 撰寫的 AWS lambda 函式來處理上傳/邏輯。
我在如何有效地實作這一點上有些掙扎,因為資料可能會變得非常大。
uj5u.com熱心網友回復:
如果您正在尋找更大陣列大小的一些性能提升,您可以應用與尼克的答案相同的邏輯,但在標準for回圈中實作。
為了
const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];
const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];
const fieldLength = MAPPED_FIELDS.length;
const dataLength = DUMMY_DATA.length;
const objectsToUpload = [];
for (let i = 1; i < dataLength; i ) {
const obj = {};
for (let j = 0; j < fieldLength; j ) {
if (MAPPED_FIELDS[j] !== null) {
obj[MAPPED_FIELDS[j]] = DUMMY_DATA[i][j];
}
}
objectsToUpload.push(obj);
}
console.log(objectsToUpload);
對于...的
entries()這里在回圈之前隔離MAPPED_FIELDS陣列的一次以避免重復生成條目迭代器并簡單地跳過null鍵而不是稍后過濾它們。解構和可能的迭代器創建/傳播似乎將它放在尼克的小陣列之下,但在更大的陣列上更快(基于 Chrome 的瀏覽器測驗)。
顯示代碼片段
const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];
const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];
const MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
const objectsToUpload = [];
for (const datum of DUMMY_DATA.slice(1)) {
const obj = {};
for (const [idx, key] of MAPPED_FIELDS_ENTRIES) {
if (key !== null) {
obj[key] = datum[idx];
}
}
objectsToUpload.push(obj);
}
console.log(objectsToUpload);
下面的粗略基準測驗結果在我的機器上如下。
for 1,000: 0.400ms
for...of 1,000: 2.900ms
entries 1,000: 1.700ms
for 10,000: 4.100ms
for...of 10,000: 11.700ms
entries 10,000: 13.900ms
for 100,000: 30.200ms
for...of 100,000: 56.500ms
entries 100,000: 100.200ms
顯示代碼片段
const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
];
const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];
function makeBigData(size) {
const [header, ...data] = DUMMY_DATA;
const r = [header];
for (let l = 0; l < size; l = 1) {
r.push([...data[Math.round(Math.random())]]);
}
return r;
}
let data = makeBigData(1000);
console.time('for 1,000');
let objectsToUpload = [];
let fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i ) {
const obj = {};
for (let j = 0; j < fieldLength; j ) {
if (MAPPED_FIELDS[j] !== null) {
obj[MAPPED_FIELDS[j]] = data[i][j];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for 1,000');
data = makeBigData(1000);
console.time('for...of 1,000');
objectsToUpload = [];
let MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
const obj = {};
for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
if (key !== null) {
obj[key] = datum[i];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for...of 1,000');
data = makeBigData(1000);
console.time('entries 1,000');
objectsToUpload = data.slice(1).map(data =>
Object.fromEntries(MAPPED_FIELDS
.map((key, idx) => [key, data[idx]])
.filter(a => a[0])
)
)
console.timeEnd('entries 1,000');
console.log();
data = makeBigData(10000);
console.time('for 10,000');
objectsToUpload = [];
fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i ) {
const obj = {};
for (let j = 0; j < fieldLength; j ) {
if (MAPPED_FIELDS[j] !== null) {
obj[MAPPED_FIELDS[j]] = data[i];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for 10,000');
data = makeBigData(10000);
console.time('for...of 10,000');
objectsToUpload = [];
MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
const obj = {};
for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
if (key !== null) {
obj[key] = datum[i];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for...of 10,000');
data = makeBigData(10000);
console.time('entries 10,000');
objectsToUpload = data.slice(1).map(data =>
Object.fromEntries(MAPPED_FIELDS
.map((key, idx) => [key, data[idx]])
.filter(a => a[0])
)
)
console.timeEnd('entries 10,000');
console.log();
data = makeBigData(100000);
console.time('for 100,000');
objectsToUpload = [];
fieldLength = MAPPED_FIELDS.length, dataLength = data.length;
for (let i = 1; i < dataLength; i ) {
const obj = {};
for (let j = 0; j < fieldLength; j ) {
if (MAPPED_FIELDS[j] !== null) {
obj[MAPPED_FIELDS[j]] = data[i];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for 100,000');
data = makeBigData(100000);
console.time('for...of 100,000');
objectsToUpload = [];
MAPPED_FIELDS_ENTRIES = [...MAPPED_FIELDS.entries()];
for (const datum of data.slice(1)) {
const obj = {};
for (const [i, key] of MAPPED_FIELDS_ENTRIES) {
if (key !== null) {
obj[key] = datum[i];
}
}
objectsToUpload.push(obj);
}
console.timeEnd('for...of 100,000');
data = makeBigData(100000);
console.time('entries 100,000');
objectsToUpload = data.slice(1).map(data =>
Object.fromEntries(MAPPED_FIELDS
.map((key, idx) => [key, data[idx]])
.filter(a => a[0])
)
)
console.timeEnd('entries 100,000');
uj5u.com熱心網友回復:
您可以map將DUMMY_DATA陣列(減去標題)放入一組陣列,其值為
MAPPED_FIELDS來自和的鑰匙DUMMY_DATA具有相同索引的對應值
然后,您可以使用filter這些陣列洗掉null鍵并將它們轉換為物件Object.fromEntries:
const DUMMY_DATA = [
['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'],
['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'],
['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']
]
const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode']
const objectsToUpload = DUMMY_DATA.slice(1).map(data =>
Object.fromEntries(MAPPED_FIELDS
.map((key, idx) => [key, data[idx]])
.filter(a => a[0])
)
)
console.log(objectsToUpload)
uj5u.com熱心網友回復:
尼克寫的一個稍微不同的版本是決議第MAPPED_FIELDS一個,把它們變成[name, index]對,然后洗掉那些有null名字的。然后我們可以更有效地掃描和轉換物件。它可能看起來像這樣:
const mapData = (fields, locs = Object .entries (fields) .filter (([_, k]) => k !== null)) =>
([headers, ...rows]) => rows .map (
r => Object .assign (...locs .map (([i, n]) => ({[n]: r[i]})))
)
const DUMMY_DATA = [ ['First Name', 'Last Name', 'company', 'email', 'phone', 'Address', 'City', 'State', 'Zip Code'], ['Lambert', 'Beckhouse', 'StackOverflow', '[email protected]', '512-555-1738', '316 Arapahoe Way', 'Austin', 'TX', '78721'], ['Maryanna', 'Vassman', 'CDBABY', '[email protected]', '479-204-8976', '1126 Troy Way', 'Fort Smith', 'AR', '72916']]
const MAPPED_FIELDS = ['firstName', 'lastName', 'company', 'email', 'phone', null, null, null, 'zipCode'];
console .log (mapData (MAPPED_FIELDS) (DUMMY_DATA))
.as-console-wrapper {max-height: 100% !important; top: 0}
我認為就性能而言,這將介于 pilchard 的for-each版本和 Nick 的版本之間。但所有這些都是線性的,所以我認為沒有嚴重的演算法問題。除非您需要盡可能地發揮所有性能,否則我會選擇更簡單的一種。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/520454.html
下一篇:JeanMeeus書中的演算法
