Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

        
AI编程,程序员挑战年入30~100万高级指南 - 职业规划
SpringBoot+SpringSecurity+Vue权限系统高级实战课程        

IDEA永久激活

Java微信小程序电商实战课程(SpringBoot+VUe)

     

AI人工智能学习大礼包

     

PyCharm永久激活

66套java实战课程无套路领取

     

Cursor+Claude AI编程 1天快速上手视频教程

     
当前位置: 主页 > Java文档 > 大数据云计算 >

Hive on Spark EXPLAIN statement PDF 下载


时间:2020-03-28 18:11来源:http://www.java1234.com 作者:小锋  侵权举报
Hive on Spark EXPLAIN statement PDF 下载
失效链接处理
Hive on Spark EXPLAIN statement  PDF 下载

本站整理下载:
提取码:l2nu 
 
 
相关截图:
 
主要内容:
Hive on Spark EXPLAIN statement In Hive, command EXPLAIN can be used to show the execution plan of a query. The language manual has lots of good information. For Hive on Spark, this command itself is not changed. It behaves the same as before. It still shows the dependency graph, and plans for each stage. However, if the query engine (hive.execution.engine) is set to “spark”, it shows the execution plan with the Spark query engine, instead of the default (“mr”) MapReduce query engine. Dependency Graph Dependency graph shows the dependency relationship among stages. For Hive on Spark, there are Spark stages instead of Map Reduce stages. There is no difference for other stages, for example, Move stage, Stats­Aggr stage, etc.. For most queries, there is just one Spark stage since many map and reduce works can be done in one Spark work. Therefore, for a same query, with Hive on Spark, there may be less number of stages. For some queries, there are multiple Spark stages, for example, queries with map join, skew join, etc.. One thing should be pointed out that here a stage means a Hive stage. It is very different from the stage concept in Spark. A Hive stage could correspond to multiple stages in Spark. In Spark, a stage usually means a group of tasks that can be processed in one executor. In Hive, a stage contains a list of operations that can be processed in one job. Spark Stage Plan The plans for each stage are shown by command EXPLAIN, besides dependency graph. For Hive on Spark, the Spark stage is new. It replaces the Map Reduce stage for Hive on MapReduce. The Spark stage shows the Spark work graph, which is a DAG (directed acyclic graph). It contains: ● DAG name, the name of the Spark work DAG; ● Edges, that shows the dependency relationship among works in this DAG; ● Vertices, that shows the operator tree of each work. For each individual operator tree, there is no change for Hive on Spark. The difference is dependency graph. For MapReduce, you can’t have a reducer without a mapper. For Spark, that’s not a problem. Therefore, Hive on Spark can optimize the plan and get rid of those mappers not needed. The edge information is new for Hive on Spark. There is no such information for MapReduce. Different edge type indicates different shuffle requirement. For example,

 
------分隔线----------------------------


锋哥推荐