HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Apache Tez query performance issue

I am running Apache Tez enabled Hortonworks HDP 2.2 cluster for bench marking some query performance against HIVE+TEZ ORC vs Impala parquet. Even after doing below TEZ setting on command shell performance for query is not coming optimal. Any idea what else can be done here to improve the performance.

The table I am using is having 200 plus columns and 20 date partitions and I am doing a simple count(*) across all partition .

select col1,col2 count(*)from orc_table_ext groupby col1,col2;

Running in Impala it takes 21.06s while on TEZ its taking 100.9 seconds

set hive.execution.engine=tez;set hive.vectorized.execution.enabled=true;set hive.vectorized.execution.reduce.enabled=true;set hive.vectorized.groupby.maxentries=10240;set hive.vectorized.groupby.flush.percent=0.1;set hive.cbo.enable=true;set hive.compute.query.using.stats=true;set hive.stats.fetch.column.stats=true;set hive.stats.fetch.partition.stats=true;set hive.support.concurrency=true;set hive.exec.dynamic.partition.mode=nonstrict;set hive.compute.query.using.stats=true;set hive.stats.autogather=true;set hive.tez.auto.reducer.parallelism=true;set hive.fetch.task.conversion=more;set hive.compactor.initiator.on=true;set hive.compactor.worker.threads=2;

Add comment

Security code

You are here: Home Question & Answer Hadoop Questions Apache Tez query performance issue