释放双眼，带上耳机，听听看~！

测试环境

测试硬件：4核i5处理器，8G内存，1T硬盘，千兆网络

测试软件：ubuntu12.10 64位，hadoop版本：0.20.205，hbase版本：0.90.5

测试设置：一个master（namenode）和三台resigonServer（datanode），向HBase集群写入1千万个数据（一个数据15K左右）

测试结果

上图第一列和最后一列分别是插入相同数据再HBase中和HDFS中，可以看见差距很大，HBase上数据的插入时间是HDFS的10倍左右
向HBase中插入数据比HDFS性能差这么多，笔者就研究一下是什么原因让HBase写性能这么不好。向HBase中插入数据的过程大致是这样：client插入数据时先向master请求，master回复哪个resigionserver的哪个region可以给插入数据，然后client直接和resigionserver通信插入数据，resigionserver判断该数据插入到哪个datablock里(resigion是由datablock组成的)，然后以HFile的形式存储在HDFS中（数据不一定在resigionserver本地）。
影响HBase写入性能的一个因素就是用put类插入数据的缓存区问题。用put类插入数据时，默认的情况是写入一次数据由clinet和resigionserver进行一次RPC来插入数据。由于是1千万个数据，多次进行进程间通信势必会影响时间。HBase给客户端提供了写缓冲区，当缓冲区填满之后才执行写入操作，这样就减少了写入的测次数。
首先取消自动写入，setAutoFlush(false)
- 然后设置写缓冲区大小（默认是2MB）setWriteBufferSize()或者更改hbase-site.xml的hbase.client.write.buffer的属性
- 上面列表可以看出把缓冲区设为20M还是对写入时间有改进，但是改成200M写入时间更长(为什么？)
另一个因素就是WAL（

write ahead log），因为每一个resigion都有一个memstore用内存来暂时存放数据，进行排序，最后再吸入HFile里面去，这样做为了减少磁盘寻道而节省时间，但是为了灾难恢复，所以会把内存中的数据进行记录。所以笔者把WAL关闭之后，又测了下性能，还是有一点帮助的，但是帮助不是太大，可见WAL不是写入的瓶颈。（setWriteToWal(false)）

因为HBase对查询方便，能够快速的读取数据，写入时必然会采取一些措施进行排序，这就是HBase的合并和分裂机制。HBase官方为了提升写入性能，给出一种方案就是预分配resigion，也就是池的概念，你先分配一些resigion，用的时候直接用就行了。本来这1千万个数据要存储900个resigion，所以笔者预先分配了150个resigion（分配900个resigion，建表时间太长，出现异常，还没有解决），结果写入时间提升了很多，基本是原来的一半，如果能预先分配900个resigion，应该更能节省时间。

ps: 写入时，设置缓冲区越大，写数据（比如8字节）的时间越短

pps:写入buffer不宜设成过大 M级就可以。最近写入测试又做了一些实验，除了预先分配region之外，用多线程写效率还是很高的，笔者测试40个线程比单线程快了8倍左右

测试程序


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
1package GIS.Update;

2

3import java.io.IOException;

4import java.math.BigInteger;

5import org.apache.hadoop.conf.Configuration;

6import org.apache.hadoop.fs.FSDataOutputStream;

7import org.apache.hadoop.fs.FileSystem;

8import org.apache.hadoop.fs.Path;

9import org.apache.hadoop.hbase.HBaseConfiguration;

10import org.apache.hadoop.hbase.HColumnDescriptor;

11import org.apache.hadoop.hbase.HTableDescriptor;

12import org.apache.hadoop.hbase.MasterNotRunningException;

13import org.apache.hadoop.hbase.ZooKeeperConnectionException;

14import org.apache.hadoop.hbase.client.HBaseAdmin;

15import org.apache.hadoop.hbase.client.HTable;

16import org.apache.hadoop.hbase.client.Put;

17import org.apache.hadoop.hbase.util.Bytes;

18

19public class TestUpdate {

20

21  public static void testHDFS() throws IOException {

22    Stringstr = &quot;hdfs://cloudgis4:9000/usr/tmp/&quot;;

23    Pathpath = new Path(str);

24    Configurationconf = new Configuration();

25    conf.addResource(new Path(&quot;/usr/local/hadoop/conf/hdfs-site.xml&quot;));

26    FileSystemhdfs = path.getFileSystem(conf);

27    hdfs.setReplication(path, (short) 4);

28    FSDataOutputStreamfsDataOut = hdfs.create(new Path(str + &quot;zzz&quot;));

29    long begin = System.currentTimeMillis();

30    for (int i = 0; i &lt; 10000000; i++) {

31      //byte[]kkk=newbyte[10000+i/1000];

32      byte[] kkk = new byte[12];

33      fsDataOut.write(kkk);

34      //fsDataOut.close();

35      //hdfs.close();

36    }

37    fsDataOut.close();

38    long end = System.currentTimeMillis();

39    System.out.println(&quot;hdfs:&quot; + (end - begin));

40  }

41

42  public static void testHBase() throws IOException {

43    Configurationconf = HBaseConfiguration.create();

44    conf.addResource(new Path(&quot;/usr/local/hbase/conf/hbase-site.xml&quot;));

45    //conf.addResource(&quot;/usr/local/hbase/conf/hdfs-site.xml&quot;);

46    HBaseAdminadmin = new HBaseAdmin(conf);

47    StringtableName = &quot;qq&quot;;

48    StringfamilyName = &quot;imageFamily&quot;;

49    StringcolumnName = &quot;imageColumn&quot;;

50    HTableDescriptorhtd = new HTableDescriptor(tableName);

51    HColumnDescriptorhdc = new HColumnDescriptor(familyName);

52    htd.addFamily(hdc);

53    long before = System.currentTimeMillis();

54    //admin.createTable(htd,splits);

55    admin.createTable(

56      htd,

57      Bytes.toBytes(&quot;0000000&quot;),

58      Bytes.toBytes(&quot;9999999&quot;),

59      150

60    );

61    long after = System.currentTimeMillis();

62    System.out.println(after - before);

63    HTabletable = new HTable(conf, htd.getName());

64    table.setAutoFlush(false);

65    //table.setWriteBufferSize(209715200);

66    System.out.println(table.getWriteBufferSize());

67    long begin = System.currentTimeMillis();

68    for (int i = 0; i &lt; 10000000; i++) {

69      byte[] kkk = new byte[10000 + i / 1000];

70      //byte[]kkk=newbyte[12];

71      Putp1 = new Put(Bytes.toBytes(intToString(i)));

72      p1.setWriteToWAL(false);

73      p1.add(Bytes.toBytes(familyName), Bytes.toBytes(columnName), kkk);

74      table.put(p1);

75    }

76    long end = System.currentTimeMillis();

77    table.flushCommits();

78    System.out.println(&quot;HBase:&quot; + (end - begin));

79  }

80

81  public static String intToString(int x) {

82    Stringresult = String.valueOf(x);

83    int size = result.length();

84    while (size &lt; 7) {

85      size++;

86      result = &quot;0&quot; + result;

87    }

88    return result;

89  }

90

91  public static void main(String[] args) throws IOException {

92    testHBase();

93  }

94}

95

{{userData.name}}已认证

HBase数据写入测试

测试环境

测试结果

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

{{userData.name}}已认证

测试环境

测试结果

Related posts:

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

DevOps基础-5.3-持续交付：持续交付流水线

redis 和 memcache的区别

数据库集群

RabbitMQ集群搭建