19.1. 时间分区

1.建表语法

partition 'by@time from@yyyyMMddHHmmss to@yyyy days@7'

注释：

1)by@time

表示按照时间分区

2)from@ yyyyMMddHHmmss

原始数据格式,目前支持yyyyMMddHHmmss yyyyMMddHHmm, yyyyMMddHH, yyyyMMdd, yyyyMM,yyyy

3)to@yyyy

分区的格式,yyyy为按年分区

4)days@7

与当前时间相比,7天内的数据才会进行分区操作,7天外的数据,会写入到默认分区

注意:

这项是针对现场实际环境而调优,按时间分区,现场的入库程序因为某种原因(如时钟不准确)经常会出现少量的时间异常,会出现任意天数的时间,如1970~9099年的任意个时间点，

这种情况几乎在我的每一个客户那里都遇到过,由于时间值太多,创建了海量的分区,会导致系统效率很差 ,因此通过days这个参数,用来限制那些异常分区,将非正常值存储在默认分区。

4)查询限制

目前单次查询,仅允许扫描128个分区,如果超过128个分区,则会进行抛错处理。

2.查询语法

分区列目前只支持 =,in,>,<,>=,<=,between

3.基本例子

CREATE FOREIGN TABLE lxdb_partition (
ukey text,
timestamp bigint OPTIONS(partition 'by@time from@yyyyMMddHHmmss to@yyyyMMdd days@7'),
sys_partition_name text,--隐藏列无需赋值,仅用来实现类似show partitions功能
i1 int,
i2 int,
txt1 text,
txt2 text
) SERVER lxdb options(store 'ids');

4.创建分片

select create_distributed_table('lxdb_partition', 'ukey','hash','default',10);

5.插入数据

--7天内会写入到2021分区里

INSERT INTO lxdb_partition(ukey,timestamp,i1, i2, txt1,txt2) VALUES ('1111',20211051013002,0, 2, 'txt2', 'txt2');

--7天外会写入到default分区

INSERT INTO lxdb_partition(ukey,timestamp,i1, i2, txt1,txt2) VALUES ('1111',20210531013001,0, 2, 'txt2', 'txt2');

6.临时导入历史数据到对应的分区

--通过days调整

ALTER FOREIGN TABLE lxdb_partition alter timestamp OPTIONS (set partition 'by@time from@yyyyMMddHHmmss to@yyyyMMdd days@3650');

--days调整为较大的范围,就会写入到对应的分区了

INSERT INTO lxdb_partition(ukey,timestamp,i1, i2, txt1,txt2) VALUES ('1111',20150531013001,0, 2, 'txt2', 'txt2');

--历史数据完成后,记得修改完成

ALTER FOREIGN TABLE lxdb_partition alter timestamp OPTIONS (set partition 'by@time from@yyyyMMddHHmmss to@yyyyMMdd days@7');

19.2. 字符模糊分区

1.建表写法

1) partition 'by@text' 

分区格式为字符串,利用like匹配实现分区检索

2) 查询限制

目前单次查询,仅允许扫描128个分区,如果超过128个分区,检索效率会降低,对此进行抛错处理.

2.查询语法

分区列支持字符串 in,=,>,>=,<,<=,like(~~)分区检索, ( * % )配合like进行模糊匹配分区检索

3.基本例子

CREATE FOREIGN TABLE lxdb_partition1 (
ukey text,
location text OPTIONS(partition 'by@text'),
sys_partition_name text,
i1 int,
i2 int,
txt1 text,
txt2 text
) SERVER lxdb options(store 'ids');

4.创建分片

select create_distributed_table('lxdb_partition1', 'ukey','hash','default',10);

5.插入数据

INSERT INTO lxdb_partition1(ukey,location,i1, i2, txt1,txt2) VALUES ('1111','江苏省南京市',0, 2, 'txt2', 'txt2');
INSERT INTO lxdb_partition1(ukey,location,i1, i2, txt1,txt2) VALUES ('1111','aaa',0, 2, 'txt2', 'txt2');
INSERT INTO lxdb_partition1(ukey,location,i1, i2, txt1,txt2) VALUES ('1111','上海迪士尼',0, 2, 'txt2', 'txt2');
INSERT INTO lxdb_partition1(ukey,location,i1, i2, txt1,txt2) VALUES ('2222','上海迪士尼',0, 2, 'txt2', 'txt2');

6.检索结果

目前存在:江苏省南京市、aaa、上海迪士尼三个分区
1) select  *  from lxdb_partition1 where i2 > 1 and location like '上*';
‘上*’ 匹配所有以’上’开头的分区, ’*尼’匹配所有以’尼’结尾的分区;  
本条sql 仅会从上海迪士尼分区进行分析检索,若不增加where筛选分区条件,将扫描所有分区;

2) 若不加 *(%) 进行模糊匹配,效果相当于等值查询
select  * from lxdb_partition1 where i2>1 and location like '上海迪士尼'; 
select  *  from lxdb_partition1 where i2>1 and location = ‘上海迪士尼’;
只会匹配 ’上海迪士尼’ 这个分区,取消模糊分区检索;

3) In 分区检索,只会从in中对应的分区进行检索数据;
select  *  from lxdb_partirion1 where i2>1 and location in (‘aaa’,’江苏省南京市’);

4) 范围分区检索(>,>=,<,<=)
select  *  from lxdb_partition1 where i2>1 and location >= ‘aaa’;

19.3. 自定义数字分区

1.建表写法

1) partition 'by@diy pcunt@ 50' 
分区列需为数字类型，且需规定pcunt，即单个分区内有多少值，例50，则0-50为1个分区，51-100为一个分区，所有小于0的数会被分至默认分区

2) 查询限制
目前单次查询,仅允许扫描128个分区,如果超过128个分区,检索效率会降低,对此进行抛错处理.

2.查询语法

分区列的数字 in,=,>,>=,<,<=,between and进行分区检索

3.基本例子

CREATE FOREIGN TABLE lxdb_partition2 (
ukey text,
nKey bigint OPTIONS(partition 'by@diy pcount@50'),
sys_partition_name text,
i1 int,
i2 int,
txt1 text,
txt2 text
) SERVER lxdb options(store 'ids');

4.创建分片

select create_distributed_table('lxdb_partition2', 'ukey','hash','default',10);

5.插入数据

以下写入自己所在的分区
INSERT INTO lxdb_partition2(ukey,nKey,i1, i2, txt1,txt2) VALUES ('1111',50,0, 2, 'txt2', 'txt2');
INSERT INTO lxdb_partition2(ukey,nKey,i1, i2, txt1,txt2) VALUES ('1111',0,0, 2, 'txt2', 'txt2');
INSERT INTO lxdb_partition2(ukey,nKey,i1, i2, txt1,txt2) VALUES ('2222',51,0, 2, 'txt2', 'txt2');

以下写入默认分区（分区名为'-1'）
INSERT INTO lxdb_partition2(ukey,nKey,i1, i2, txt1,txt2) VALUES ('1111',-2,0, 2, 'txt2', 'txt2');

6.检索结果

目前存在:  -1 , 0 , 1 三个分区
select * from lxdb_partition2 where nKey = 0 匹配分区为0~50所在的分区
select * from lxdb_partition2 where nKey < 0 匹配分区为默认分区（-1分区）

19.4. range分区

1.建表写法

1) partition 'by@range partitions@1~50|51~80|81~300' 
分区列需为数字类型，且需规定partitions，即分区点，若导入数据不存在与分区规则内，则写入默认分区，且增加分区规则后不存入匹配分区内

2) 分区规则限制:
分区规则可以有未分区范围，但分区顺序必须按从小到大顺序排列
例：partition 'by@range partitions@1~30|51~80|81~300'  成立   
partition 'by@range partitions@1~50|51~80|-100~-50|81~300'  不成立

3) 查询限制
目前单次查询,仅允许扫描128个分区,如果超过128个分区,检索效率会降低,对此进行抛错处理.

2.查询语法

分区列的数字 in,=,>,>=,<,<=,between and 进行分区检索

3.基本例子

CREATE FOREIGN TABLE lxdb_partition_range (
ukey text,
nKey bigint OPTIONS(partition 'by@range partitions@1~50|51~80|81~300'),
sys_partition_name text,
i1 int,
i2 int,
txt1 text,
txt2 text
) SERVER lxdb options(store 'ids');

4.创建分片

select create_distributed_table('lxdb_partition2', 'ukey','hash','default',10);

5.插入数据

以下写入自己所在的分区
INSERT INTO lxdb_partition_range(ukey,nKey) VALUES ('1111',50);
INSERT INTO lxdb_partition_range(ukey,nKey) VALUES ('1111',200);

以下写入默认分区（分区名为'-1'）
INSERT INTO lxdb_partition_range(ukey,nKey) VALUES ('1111',1000);

6.检索结果

目前存在:  默认，1~50，81~300 三个分区
select * from lxdb_partition_range 匹配分区为所有分区
select * from lxdb_partition2 where nKey >500 仅匹配默认分区

7.添加分区规则

添加分区后之前存在于默认分区的数据不会转移到匹配分区内，还是存在于默认分区中，且添加分区时不能修改原有的分区规则，只能在原有基础上进行添加
ALTER FOREIGN TABLE lxdb_partition_range alter nKey OPTIONS (set partition 'by@range partitions@1~50|51~80|81~300|301~1000');

19.5. 查看分区列表

1.show partitions

目前还不支持show partitions写法,请临时用如下方法替代
select sys_partition_name,count(*) from  lxdb_partition  group by sys_partition_name order by sys_partition_name;

注: sys_partition_name是lxdb的隐藏列,建表的时候类型为text类型,无需赋值.如果建表的时候没有添加,可以通过alter table添加
sys_partition_name这个列仅支持group by 单列,并只能与count(*)组合,不能用于筛选,也不能用于返回明细

可以根据筛选条件查看数据在分区内的分布情况
select sys_partition_name,count(*) from  lxdb_partition where timestamp='20211031013001' group by sys_partition_name;

19.6. 删分区

1.删除

1):目前还不支持drop partitions的写法,可以通过sys_partition_name ='drop'这个标识,将where条件匹配到的分区删除

2):默认分区是清理不掉的,数据可以通过where条件删掉,但分区还在,默认分区只能通过drop foreign table xxx的方式清理掉.

delete from lxdb_partition where sys_partition_name ='drop' and timestamp<'20211031000000';--会清理对应日期的分区,一般用来清理过期数据
delete from lxdb_partition where sys_partition_name ='drop' and timestamp between 20211021000000 and 20211029000000;
delete from lxdb_partition where sys_partition_name ='drop' and timestamp in (20211031000000, 20211031000000, 20211029000000)

十九、数据分区

19.1. 时间分区

19.2. 字符模糊分区

19.3. 自定义数字分区

19.4. range分区

19.5. 查看分区列表

19.6. 删分区

results matching ""

No results matching ""