Yarn

查看yarn日志：yarn logs <applicationid>

HDFS

hdfs检索对应目录：hdfs dfs -ls <HDFSpath>
hdfs上传文件到对应目录：hdfs dfs -put /opt/test.txt <HDFSpath>

Hive

创建表：create table test(name string) location "<HDFSpath>";
查看表定义：desc formatted <tablename>

Spark

在hdfs指定位置创建库：create database test location "<HDFSpath>";
在test库创建表，不需要额外指定路径，默认在子目录下创建：create table test1(a int,b int) using parquet;
sparksql时间计算

# 获取当前时间
select current_date;

# 时间减法，获取当前时间之前30天，拿到是timestamp类型，不能和String直接比
select date_sub(current_date, 30);

# 表中是String，比较时间需要做类型转换，反之亦然
select * from t_test where date > date_format(date_sub(current_date, 30), 'yyyy-MM-dd');
select * from t_test2 where date > to_date(‘2025-04-25’, ‘yyyy-MM-dd’);

# 时间差计算，假设是String的yyyy-MM-dd hh:mm:ss格式，先转换后计算，需要注意的是UNIX_TIMESTAMP相减，得到的就是秒数
select UNIX_TIMESTAMP(DATE_FORMAT(time2, 'yyyy-MM-dd HH:mm:ss')) - UNIX_TIMESTAMP(DATE_FORMAT(time1, 'yyyy-MM-dd HH:mm:ss')) from t_test

Flink

执行flink任务：flink run xxxx.jar -input <inputpath> -output <outputpath>

目录CONTENT

大数据组件的常用命令

Yarn

HDFS

Hive

Spark

Flink

评论区