HadoopExam Blogs

HadoopExam Learning Resources

Monitoring: Once, you defined your application/ or Spark Job and submitted to Spark cluster for execution. That is not good enough. You need to know, what is happening with the Job you submitted to Spark cluster. Is cluster has enough resources to run my job, the configuration parameters I have been using for the job I submitted is good enough? This all required real-time monitoring. There are various ways, by which you can monitor your application. Let’s discuss all the ways by which we can monitor our submitted application for performance perspective.

  1. Stage, Tasks and Jobs: Do you know the concept of job, stage and tasks. When you submit the jobs. It is very important for any Spark Job you submit, you are able to calculate how many stages and tasks would be there. Reason, you can gauge the performance of you submitted application to Spark cluster. This way you can conceptually understand, how your application will perform on Spark cluster.
  2. Spark Web UI: Spark framework comes with its own web UI, where you can monitor each individual Spark job. Once you submit the job to Spark, it will create a link specific to your job on this web UI and by clicking on it, you can monitor your individual application as well, you can see, if you have defined any counters in that application.

Performance Tuning: You may be thinking you have written Spark application very well for solving your problem, but somehow, you not only look the correct API, but other factors are also very important like. Is API you used is correct for parallelism, or having less data shuffling or it will spill data over disk. Have you done caching of RDD on right place? How the current cluster configuration is affecting your application. Once you have understood correctly, you should be able to tweak individual configuration parameter for your application. Learn concepts of data locality etc. Data locality will reduce the data shuffling.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MAPR Certified Spark Developer Syllabus Part-5