HadoopExam Blogs

HadoopExam Learning Resources

Tips & Tricks for Certification: Based on our experience talking with the learners who had already appeared in real exam, we have created some tips and tricks, which you need to know before appearing in real exam.

  • Very less theoretical questions (Around 20% questions based on concept)
  • 80% questions will be based on Code snippet and Sample data.
  • No questions are being asked on GraphX as of now.
  • No direct question: you need to know the underline concept to correctly answer the question.
  • Quite complex questions.
  • Question on using code snippet with map and flatMap functions
  • Difference between supervised & un-supervised learning. Which one is unsupervised learning algorithm with below options?
    • Supervised Learning
    • Understand basics of Regression
      • Linear regression
      • logistic regression
  • Understand classification algorithms
  • Naive baise classifiers
  • SVM (Simple vector machine)
  • Random decision forest.
  • Unsupervised Learning
  • Dimension reduction.
  • PCA
  • SVD
  • K-means clustering
  • Difference between classification and clustering
  • Maximum questions are from RDD : Around 17 Questions
  • SparkSQL and DataFrame around : 14 Questions
  • Spark Streaming 7 Questions
  • Machine Learning 7 Questions
  • PairRDD, Monitoring, Stage, Lineage : 10 Questions
  • Broadcast variables and Accumulators : 3-5 Questions
  • Partitioning and Re-partitioning : 7 Questions
  • Understand ReduceByKey, GroupByKey and Reduce functions (Questions are certain from this)
  • Configurations parameters related questions to improve the performance, what is the memory requirement for executor etc.
  • They might give a practical scenario with sample data, with some cluster information like 10 nodes, 30 executors, and HDFS directory containing 100 files, which will be loaded by Spark Job. What all optimizations are possible, what is the memory needs to be configured, what is wrong with current configuration etc.
  • RDD caching and persist questions (2-4 questions)
  • Initial value of the Accumulator will be given and once job complete what will be the final value of the Accumulator.
  • In a Spark Job, how many stages will be executed and how many will be skipped, based on RDD cache.
  • Find possible number of partitions.
  • Output format of the Spark Job.
  • You will be given Code snippet and you need to select correct output from given question.
  • Understand API method of Spark Streaming ReduceByKeyAndWindow.
  • Practice and understand PairRDD functions like : groupByKey, reduceByKey, combineByKey
  • Understand RDD API function like fold and reduce.
  • Read little bit about MLib datatypes.
  • Understand LabeledPoints
  • Streaming window operations are very important.
  • How fault tolerant is achieved is spark streaming.
  • How back pressure is achieved in streaming?
  • How to tune spark job
  • Spark UI questions will be asked, but they are quite simple. Hence, just visit at least once Spark Web UI.
  • Read DataFrame API methods.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MapR Spark Certifications Tips and Tricks