如何從Javascript中的降價字串中僅獲取文本值-有解無憂

我目前有一些代碼使用marked.js將一個大的markdown 字串（從.md 檔案讀取）轉換為html 以在瀏覽器上顯示。'md' 是降價字串，呼叫 'marked(md)' 會將其轉換為 html。

getContent(filePath)
        .then(response => {
            if (!response.ok) {
                return Promise.reject(response);
            }
            return response.text().then(md => setContent(marked(md)));
        })
        .catch(e => Dialog.error('Page failed to load!', e));
}, [filePath]);

我如何（使用marked.js 或其他解決方案）決議markdown/html 以僅獲取文本值？下面是一些示例 Markdown。

### HEADER TEXT

---

# Some Page Title

<a href="cafe" target="_blank">Go to Cafe Page</a>

    <Cafe host>/portos/cafe

## Links
- ##### [Tacos](#cafe_tacos)
- ##### [Burritos](#cafe_burritos)
- ##### [Bebidas](#cafe_bebidas)


## Overview
This is the overview text for the page. I really like tacos and burritos.

[![Taco Tabs](some/path/to/images/hello.png 'Tacos')](some/path/to/images/hello.png)

## Dining <a name="dining"></a>

Dining is foo bar burrito taco mulita. 

[![Cafe Overview](some/path/to/images/hello2.png 'Cafe Overview')](some/path/to/images/hello2.png)

The cafe has been open since 1661. It has lots of food.

It was declared the top 1 cafe of all time.

### How to order food

You can order food by ordering food.

<div hljs-string">alert-info">
    <strong> Note: </strong> TACOS ARE AMAZING.
</div>

uj5u.com熱心網友回復：

一種方法是使用DOMParser API決議 HTML 字串以將字串轉換為物件，然后使用TreeWalkerDocument物件遍歷它以獲取HTML中每個節點的。結果應該是一個字串陣列。textContentText

function parseTextFromMarkDown(mdString) {
  const htmlString = marked(mdString);
  const parser = new DOMParser();
  const doc = parser.parseFromString(htmlString, 'text/html');
  const walker = document.createTreeWalker(doc, NodeFilter.SHOW_TEXT);

  const textList = [];
  let currentNode = walker.currentNode;

  while(currentNode) {
    textList.push(currentNode.textContent);
    currentNode = walker.nextNode();
  }

  return textList;
}

uj5u.com熱心網友回復：

雖然我認為 Emiel 已經給出了最好的答案，但另一種方法是使用由 Marked 的決議器mdast創建的抽象語法樹。然后我們可以遍歷提取所有文本的語法樹，將其組合成合理的輸出。一種方法如下所示：

const astToText = ((types) => ({type, children = [], ...rest}) => 
  (types [type] || types .default) (children .map (astToText), rest)
)(Object .fromEntries (Object .entries ({
  'default': () => ` *** Missing type: ${type} *** `,
  'root': (ns) => ns .join ('\n'),
  'heading, paragraph': (ns) => ns .join ('')   '\n',
  'text, code': (ns, {value}) => value,
  'html': (ns, {value}) => 
      new DOMParser () .parseFromString (value, 'text/html') .textContent, 
  'listItem, link, emphasis': (ns) => ns .join (''),
  'list': (ns, {ordered}) => ordered 
      ? ns .map ((n, i) => `${i   1} ${n}`) .join ('\n')
      : ns .map ((n) => `? ${n}`) .join ('\n'),
  'image': (ns, {title, url, alt}) => `Image "${title}" ("${alt}" - ${url})`,
  // ... probably many more
}) .flatMap (([k, v]) => k .split (/,\s*/) .map (n => [n, v]))))

// import {fromMarkdown} from 'mdast-util-from-markdown'
// const ast = fromMarkdown (<your string>)

// dummy version
const ast = {type: "root", children: [{type: "heading", depth:1, children: [{type: "text", value: "Some Page Title", children: []}]}, {type: "paragraph", children: [{type: "html", value: '<a href="cafe" target="_blank">', children: []}, {type: "text", value: "Go to Cafe Page", children: []}, {type: "html", value: "</a>", children: []}]}, {type: "code", lang:null, meta:null, value: "<Cafe host>/portos/cafe", children: []}, {type: "heading", depth:2, children: [{type: "text", value: "Links", children: []}]}, {type: "list", ordered:!1, start:null, spread:!1, children: [{type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_tacos", children: [{type: "text", value: "Tacos", children: []}]}]}]}, {type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_burritos", children: [{type: "text", value: "Burritos", children: []}]}]}]}, {type: "listItem", spread:!1, checked:null, children: [{type: "heading", depth:5, children: [{type: "link", title:null, url: "#cafe_bebidas", children: [{type: "text", value: "Bebidas", children: []}]}]}]}]}, {type: "heading", depth:2, children: [{type: "text", value: "Overview", children: []}]}, {type: "paragraph", children: [{type: "text", value: "This is the overview text for the page. I really like tacos and burritos.", children: []}]}, {type: "paragraph", children: [{type: "link", title:null, url: "some/path/to/images/hello.png", children: [{type: "image", title: "Tacos", url: "some/path/to/images/hello.png", alt: "Taco Tabs", children: []}]}]}, {type: "heading", depth:2, children: [{type: "text", value: "Dining ", children: []}, {type: "html", value: '<a name="dining">', children: []}, {type: "html", value: "</a>", children: []}]}, {type: "paragraph", children: [{type: "text", value: "Dining is foo bar burrito taco mulita.", children: []}]}, {type: "paragraph", children: [{type: "link", title:null, url: "some/path/to/images/hello2.png", children: [{type: "image", title: "Cafe Overview", url: "some/path/to/images/hello2.png", alt: "Cafe Overview", children: []}]}]}, {type: "paragraph", children: [{type: "text", value: "The cafe has been open since 1661. It has lots of food.", children: []}]}, {type: "paragraph", children: [{type: "text", value: "It was declared the top 1 cafe of all time.", children: []}]}, {type: "heading", depth:3, children: [{type: "text", value: "How to order food", children: []}]}, {type: "paragraph", children: [{type: "text", value: "You can order food by ordering food.", children: []}]}, {type: "html", value: '<div >\n    <strong> Note: </strong> TACOS ARE AMAZING.\n</div>', children: []}]}

console .log (astToText (ast))

.as-console-wrapper {max-height: 100% !important; top: 0}

這種方法相對于純 HTML 的優點是我們可以決定某些節點如何以純文本呈現。例如，這里我們選擇渲染這個影像標記：

![Taco Tabs](some/path/to/images/hello.png 'Tacos')

作為

Image "Tacos" ("Taco Tabs" - some/path/to/images/hello.png)

當然 HTML 節點仍然會有問題。這里我使用DOMParserand .textContent，但您可以將其添加text, code到包含原始 HTML 文本。

傳遞給配置的每個函式都會接收已格式化的子節點串列以及節點的其余部分，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/518406.html

標籤：javascripthtml解析降价javascript标记的

上一篇：定時器回呼函式的只讀屬性

下一篇：在沒有正則運算式的情況下決議兩組字符之間的字串