失效链接处理 |
最优化Spark应用的性能 使用低成本的层次化方案加速大数据处理-俞育才 PDF 下载
本站整理下载:
相关截图:
主要内容:
Software Tuning – Partition
• Tasks number are decided by RDD’s partition number.
• How to choose proper partition number?
- If there are fewer partition than available cores, the tasks won’t be taking
advantage of all CPU.
- Fewer partition, bigger data size, it means that more memory pressure
especially in join, cogroup, *ByKey etc.
- If the number is too large, more tasks, more iterative, more time.
- Too large also puts more pressure in disk. When shuffle read, it leads to more
small segment to fetch, especially worse in HDDs. - Set a big number to make application run success, decrease it gradually to
reach best performance point, pay attention to the GC.
- Sometimes, changing partition number to avoid data incline, checking this info
from WebUI
|