html table 转 Markdown表格 (python脚本实现)

释放双眼,带上耳机,听听看~!

如果有很多特殊符号不一定能处理好,需要自己调整下脚本语言


in.txt (浏览器 复制元素 内容而来)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1<table class="data-table"><tbody>
2    <tr>
3    <th>Name</th>
4    <th>Description</th>
5    <th>Type</th>
6    <th>Default</th>
7    <th>Valid Values</th>
8    <th>Importance</th>
9    </tr>
10    <tr>
11    <td>blacklist</td><td>Fields to exclude. This takes precedence over the whitelist.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
12    <tr>
13    <td>renames</td><td>Field rename mappings.</td><td>list</td><td>""</td><td>list of colon-delimited pairs, e.g. <code>foo:bar,abc:xyz</code></td><td>medium</td></tr>
14    <tr>
15    <td>whitelist</td><td>Fields to include. If specified, only these fields will be used.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
16    </tbody></table>
17

python脚本


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
1# -*- coding:utf-8 -*-
2
3import re
4from bs4 import BeautifulSoup
5
6f = open('in.txt')
7contents = f.read()
8# print(contents)
9f_out = open('out.md','w+')
10
11soup = BeautifulSoup(contents, 'html5lib')
12data_list = []
13for idx, tr in enumerate(soup.find_all('tr')):
14    if idx != 0:
15        tds = tr.find_all('td')
16        row_str = "|"
17        for td in tds:
18            #print td.contents
19            #print type(td.contents)
20            td_content_list = []
21            for content in td.contents:
22                # 强制转换为 string
23                str2 = str(content)
24                # 替换 <code> </code> 为 ```
25                str3 = str2.replace("<code>", "```" ).replace("</code>", "```" )
26                #print str3
27                td_content_list.append(str3)
28            # list 转 str
29            td_content_str = ''.join(td_content_list)
30            #print td_content_str
31            row_str = row_str + " " + td_content_str + " |"
32
33            # row_str = row_str + " " + td.text + " |"
34        f_out.write(row_str + "\n")
35    else:
36        # 表头
37        ths = tr.find_all('th')
38        # tlen = len(ths)
39        row_str = "|"
40        row_str2 = "|"
41        for th in ths:
42            row_str = row_str + " " + th.text + " |"
43            row_str2 = row_str2 + " :- |"
44        f_out.write(row_str + "\n")  
45        f_out.write(row_str2 + "\n")
46
47f.close()
48f_out.close()
49

转换后写入到 out.md文件中


1
2
3
4
5
6
7
1| Name | Description | Type | Default | Valid Values | Importance |
2| :- | :- | :- | :- | :- | :- |
3| blacklist | Fields to exclude. This takes precedence over the whitelist. | list | "" |  | medium |
4| renames | Field rename mappings. | list | "" | list of colon-delimited pairs, e.g. ```foo:bar,abc:xyz``` | medium |
5| whitelist | Fields to include. If specified, only these fields will be used. | list | "" |  | medium |
6
7

给TA打赏
共{{data.count}}人
人已打赏
安全技术

C++遍历文件夹

2022-1-11 12:36:11

安全资讯

深信服的“野心”:安全与云计算的双料冠军

2016-12-25 11:19:15

个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索