释放双眼,带上耳机,听听看~!
如果有很多特殊符号不一定能处理好,需要自己调整下脚本语言
in.txt (浏览器 复制元素 内容而来)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 1<table class="data-table"><tbody>
2 <tr>
3 <th>Name</th>
4 <th>Description</th>
5 <th>Type</th>
6 <th>Default</th>
7 <th>Valid Values</th>
8 <th>Importance</th>
9 </tr>
10 <tr>
11 <td>blacklist</td><td>Fields to exclude. This takes precedence over the whitelist.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
12 <tr>
13 <td>renames</td><td>Field rename mappings.</td><td>list</td><td>""</td><td>list of colon-delimited pairs, e.g. <code>foo:bar,abc:xyz</code></td><td>medium</td></tr>
14 <tr>
15 <td>whitelist</td><td>Fields to include. If specified, only these fields will be used.</td><td>list</td><td>""</td><td></td><td>medium</td></tr>
16 </tbody></table>
17
python脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 1# -*- coding:utf-8 -*-
2
3import re
4from bs4 import BeautifulSoup
5
6f = open('in.txt')
7contents = f.read()
8# print(contents)
9f_out = open('out.md','w+')
10
11soup = BeautifulSoup(contents, 'html5lib')
12data_list = []
13for idx, tr in enumerate(soup.find_all('tr')):
14 if idx != 0:
15 tds = tr.find_all('td')
16 row_str = "|"
17 for td in tds:
18 #print td.contents
19 #print type(td.contents)
20 td_content_list = []
21 for content in td.contents:
22 # 强制转换为 string
23 str2 = str(content)
24 # 替换 <code> </code> 为 ```
25 str3 = str2.replace("<code>", "```" ).replace("</code>", "```" )
26 #print str3
27 td_content_list.append(str3)
28 # list 转 str
29 td_content_str = ''.join(td_content_list)
30 #print td_content_str
31 row_str = row_str + " " + td_content_str + " |"
32
33 # row_str = row_str + " " + td.text + " |"
34 f_out.write(row_str + "\n")
35 else:
36 # 表头
37 ths = tr.find_all('th')
38 # tlen = len(ths)
39 row_str = "|"
40 row_str2 = "|"
41 for th in ths:
42 row_str = row_str + " " + th.text + " |"
43 row_str2 = row_str2 + " :- |"
44 f_out.write(row_str + "\n")
45 f_out.write(row_str2 + "\n")
46
47f.close()
48f_out.close()
49
转换后写入到 out.md文件中
1
2
3
4
5
6
7 1| Name | Description | Type | Default | Valid Values | Importance |
2| :- | :- | :- | :- | :- | :- |
3| blacklist | Fields to exclude. This takes precedence over the whitelist. | list | "" | | medium |
4| renames | Field rename mappings. | list | "" | list of colon-delimited pairs, e.g. ```foo:bar,abc:xyz``` | medium |
5| whitelist | Fields to include. If specified, only these fields will be used. | list | "" | | medium |
6
7