8.1.3. SQL规范使用说明
1. SQL语句与表名、字段名
LSQL 中表名与字段名区分大小写,若表名和字段名为大写字母,则查询语句中表名、字段名必须为大写;如果业务对接元数据,需根据公司统一标准制定表名和表字段名。 LSQL 中表名及字段名可以由数字、字母、下划线组成,不能以数字及_或者特殊字符开头,只能以字母开头。
对于表名和字段名不能包含如:or、from、where、partition、order、desc、limit、join、on、and、between、like、sum、count、avg、min、to、with、as等关键词;在此不再一一列举,业务上使用时尽量避开所有的SQL关键字,包含大小写。
LSQL 表中分区标识 partition 与RamIndexFilter的筛选必须使用以下写法:
- 分区:partition 不能大写;
- 基于RamIndexFilter的筛选:RamIndexFilter@0000000001:10240:over_rand@ 不能大写。
2. SQL的数据类型
LSQL主要有三种类型 :
基本类型
分词类型
多值列
他们命名有共同特点:
i 表示 进行索引 INDEXED
s 表示 进行了行存储 STORED
d 表示 进行了列存储 DOCVALUES
m 表示多值列
p 表示存储了词的偏移量,可以通过LIKE进行词的顺序模糊匹配
3. 行存储与列存
LSQL 数据库支持行存、列存、行存加列存。根据不同的业务场景使用不同的存储方式能够提高业务效率。
行存:只做查询(只有 select),不做统计的情况下(count、avg、sum)推荐使用行存。
列存:做统计功能时,需要统计、分组、排序的字段必须为列存,行存无法使用。
当查询的字段数小于 8 个时,字段做行存查询走行存,字段做列存查询走列存,字段做行存加列存查询优先走列存。
当查询的字段数大于8 个时,字段做行存查询走行存,字段做列存查询走列存,字段做行存加列存查询优先走行存。(8 这个分界点可以设置)
就单独查询而言查询的字段小于8 个,列存类型效率高,查询的字段大于8 个,行存类型效率高。
4. SQL TABLE DDL
创建表时不支持 IF NOT EXISTS操作,注意不要重复创建,会导致表的元数据被覆盖。
CREATE TABLE TABLE_NAME
[(COL_NAME DATA_TYPE ,...)] ;
查看表结构
DESC TABLE_NAME;
查看表分区
SHOW PARTITIONS TABLE_NAME;
查看建表语句
SHOW CREATE TABLE TABLE_NAME;
5. SQL mapping DDL
创建映射表
CREATE mapping mapping_NAME (
s0 y_string_id,
s1 y_string_id physical@s1_y_string_id,
/*physical@s1_y_string_id表示在物理表中实际存储的字段为s1_y_string_id */
wc4_1 y_wildcard4_isp ,
wc8_1 y_wildcard8_isp ,
)
tableproperties
(
mapping_physical_table='mappping_example001_physical', /*映射到哪张表里去*/
);
删除映射表
DROP mapping mapping_NAME;
清空视图表
TRUNCATE mapping mapping_NAME partitions [PARTITION1,PARTITION2..];
查看分区
SHOW partitions mapping_NAME;
查看表结构
DESC mapping_NAME;
6. LSQL和Hive保留字
All Keywords | ||
---|---|---|
Version | Non-reserved Keywords | Reserved Keywords |
Hive 1.2.0 | ADD, ADMIN, AFTER, ANALYZE, ARCHIVE, ASC, BEFORE, BUCKET, BUCKETS, CASCADE, CHANGE, CLUSTER, CLUSTERED, CLUSTERSTATUS, COLLECTION, COLUMNS, COMMENT, COMPACT, COMPACTIONS, COMPUTE, CONCATENATE, CONTINUE, DATA, DATABASES, DATETIME, DAY, DBPROPERTIES, DEFERRED, DEFINED, DELIMITED, DEPENDENCY, DESC, DIRECTORIES, DIRECTORY, DISABLE, DISTRIBUTE, ELEM_TYPE, ENABLE, ESCAPED, EXCLUSIVE, EXPLAIN, EXPORT, FIELDS, FILE, FILEFORMAT, FIRST, FORMAT, FORMATTED, FUNCTIONS, HOLD_DDLTIME, HOUR, IDXPROPERTIES, IGNORE, INDEX, INDEXES, INPATH, INPUTDRIVER, INPUTFORMAT, ITEMS, JAR, KEYS, KEY_TYPE, LIMIT, LINES, LOAD, LOCATION, LOCK, LOCKS, LOGICAL, LONG, MAPJOIN, MATERIALIZED, METADATA, MINUS, MINUTE, MONTH, MSCK, NOSCAN, NO_DROP, OFFLINE, OPTION, OUTPUTDRIVER, OUTPUTFORMAT, OVERWRITE, OWNER, PARTITIONED, PARTITIONS, PLUS, PRETTY, PRINCIPALS, PROTECTION, PURGE, READ, READONLY, REBUILD, RECORDREADER, RECORDWRITER, REGEXP, RELOAD, RENAME, REPAIR, REPLACE, REPLICATION, RESTRICT, REWRITE, RLIKE, ROLE, ROLES, SCHEMA, SCHEMAS, SECOND, SEMI, SERDE, SERDEPROPERTIES, SERVER, SETS, SHARED, SHOW, SHOW_DATABASE, SKEWED, SORT, SORTED, SSL, STATISTICS, STORED, STREAMTABLE, STRING, STRUCT, TABLES, TBLPROPERTIES, TEMPORARY, TERMINATED, TINYINT, TOUCH, TRANSACTIONS, UNARCHIVE, UNDO, UNIONTYPE, UNLOCK, UNSET, UNSIGNED, URI, USE, UTC, UTCTIMESTAMP, VALUE_TYPE, VIEW, WHILE, YEAR | ALL, ALTER, AND, ARRAY, AS, AUTHORIZATION, BETWEEN, BIGINT, BINARY, BOOLEAN, BOTH, BY, CASE, CAST, CHAR, COLUMN, CONF, CREATE, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_TIMESTAMP, CURSOR, DATABASE, DATE, DECIMAL, DELETE, DESCRIBE, DISTINCT, DOUBLE, DROP, ELSE, END, EXCHANGE, EXISTS, EXTENDED, EXTERNAL, FALSE, FETCH, FLOAT, FOLLOWING, FOR, FROM, FULL, FUNCTION, GRANT, GROUP, GROUPING, HAVING, IF, IMPORT, IN, INNER, INSERT, INT, INTERSECT, INTERVAL, INTO, IS, JOIN, LATERAL, LEFT, LESS, LIKE, LOCAL, MACRO, MAP, MORE, NONE, NOT, NULL, OF, ON, OR, ORDER, OUT, OUTER, OVER, PARTIALSCAN, PARTITION, PERCENT, PRECEDING, PRESERVE, PROCEDURE, RANGE, READS, REDUCE, REVOKE, RIGHT, ROLLUP, ROW, ROWS, SELECT, SET, SMALLINT, TABLE, TABLESAMPLE, THEN, TIMESTAMP, TO, TRANSFORM, TRIGGER, TRUE, TRUNCATE, UNBOUNDED, UNION, UNIQUEJOIN, UPDATE, USER, USING, UTC_TMESTAMP, VALUES, VARCHAR, WHEN, WHERE, WINDOW, WITH |
Hive 2.0.0 | removed: REGEXP, RLIKE | added: COMMIT, ONLY, REGEXP, RLIKE, ROLLBACK, START |
added: AUTOCOMMIT, ISOLATION, LEVEL, OFFSET, SNAPSHOT, TRANSACTION, WORK, WRITE | ||
Hive 2.1.0 | added: ABORT, KEY, LAST, NORELY, NOVALIDATE, NULLS, RELY, VALIDATE | added: CACHE, CONSTRAINT, FOREIGN, PRIMARY, REFERENCES |
Hive 2.2.0 | added: DETAIL, DOW, EXPRESSION, OPERATOR, QUARTER, SUMMARY, VECTORIZATION, WEEK, YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS | added: DAYOFWEEK, EXTRACT, FLOOR, INTEGER, PRECISION, VIEWS |
Hive 3.0.0 | added: TIMESTAMPTZ, ZONE | added: TIME, NUMERIC, SYNC |
LSQL | added: SYSKV, LUCENE_QUERY_S, LUCENE_ENCODE_QUERY_S, PARTITIONS, LIMIT, LONG, TEXT, TEXT_HANLP, WILDCARD2, WILDCARD4, WILDCARD8, GEOPOINT, MORTONGEO, MAX, MIN, SUM, AVG, COUNT, COLUMNFAMILY, COLUMNFAMILYS, MAPPING, DEFAULT, INDEX, STORE, COMPRESS, TRUE, FALSE, AT, CLEANDAY, RAMINDEXFILTER, OVER_RAND, OVER_ERROR, OVER_RANDOM, SPLIT, RAMINDEXQUERY, TERMLIKE, QUERY |