HadoopExam Learning Resources

HadoopExam Training, Interview Questions, Certifications, Projects, POC and Hands On exercise access

    40000+ Learners upgraded/switched career    Testimonials

Apache Tez query performance issue

I am running Apache Tez enabled Hortonworks HDP 2.2 cluster for bench marking some query performance against HIVE+TEZ ORC vs Impala parquet. Even after doing below TEZ setting on command shell performance for query is not coming optimal. Any idea what else can be done here to improve the performance.

The table I am using is having 200 plus columns and 20 date partitions and I am doing a simple count(*) across all partition .

select col1,col2 count(*)from orc_table_ext groupby col1,col2;

Running in Impala it takes 21.06s while on TEZ its taking 100.9 seconds

set hive.execution.engine=tez;set hive.vectorized.execution.enabled=true;set hive.vectorized.execution.reduce.enabled=true;set hive.vectorized.groupby.maxentries=10240;set hive.vectorized.groupby.flush.percent=0.1;set hive.cbo.enable=true;set hive.compute.query.using.stats=true;set hive.stats.fetch.column.stats=true;set hive.stats.fetch.partition.stats=true;set hive.support.concurrency=true;set hive.exec.dynamic.partition.mode=nonstrict;set hive.compute.query.using.stats=true;set hive.stats.autogather=true;set hive.tez.auto.reducer.parallelism=true;set hive.fetch.task.conversion=more;set hive.compactor.initiator.on=true;set hive.compactor.worker.threads=2;

Visit Home Page : http://hadoopexam.com for more detail . As you are not blacklisted user.

You are here: Home Question & Answer Hadoop Questions Apache Tez query performance issue