失效链接处理 |
Hive on Spark EXPLAIN statement PDF 下载
本站整理下载:
相关截图:
主要内容:
Hive on Spark EXPLAIN statement In Hive, command EXPLAIN can be used to show the execution plan of a query. The language manual has lots of good information. For Hive on Spark, this command itself is not changed. It behaves the same as before. It still shows the dependency graph, and plans for each stage. However, if the query engine (hive.execution.engine) is set to “spark”, it shows the execution plan with the Spark query engine, instead of the default (“mr”) MapReduce query engine. Dependency Graph Dependency graph shows the dependency relationship among stages. For Hive on Spark, there are Spark stages instead of Map Reduce stages. There is no difference for other stages, for example, Move stage, StatsAggr stage, etc.. For most queries, there is just one Spark stage since many map and reduce works can be done in one Spark work. Therefore, for a same query, with Hive on Spark, there may be less number of stages. For some queries, there are multiple Spark stages, for example, queries with map join, skew join, etc.. One thing should be pointed out that here a stage means a Hive stage. It is very different from the stage concept in Spark. A Hive stage could correspond to multiple stages in Spark. In Spark, a stage usually means a group of tasks that can be processed in one executor. In Hive, a stage contains a list of operations that can be processed in one job. Spark Stage Plan The plans for each stage are shown by command EXPLAIN, besides dependency graph. For Hive on Spark, the Spark stage is new. It replaces the Map Reduce stage for Hive on MapReduce. The Spark stage shows the Spark work graph, which is a DAG (directed acyclic graph). It contains: ● DAG name, the name of the Spark work DAG; ● Edges, that shows the dependency relationship among works in this DAG; ● Vertices, that shows the operator tree of each work. For each individual operator tree, there is no change for Hive on Spark. The difference is dependency graph. For MapReduce, you can’t have a reducer without a mapper. For Spark, that’s not a problem. Therefore, Hive on Spark can optimize the plan and get rid of those mappers not needed. The edge information is new for Hive on Spark. There is no such information for MapReduce. Different edge type indicates different shuffle requirement. For example,
|