HadoopExam Blogs

HadoopExam Learning Resources

Load and Inspect Data: This activity is always needs to be done whenever you are going to work with huge volume of data. Before starting any processing on the data you need to load this data and it may be possible your data is not in the format, in which you want it. So you will wrangle with this data and create as per your further processing need. Hence, you should know how to load data from HDFS, S3, RDBMS, NOSQL DB and Local File System. Also you should also aware how to convert it into the format, which you need. Let’s discuss each subtopic under this.

  1. Creating RDD: You should know, what exactly the RDD is. How can you create RDD from various data format e.g. JSON, CSV, Sequence File, Parquet File and Avro files etc. How do you create RDD from Java collection?
  2. Transformations : This is one type of operation you apply on the RDD and which create another RDD. Remember RDD’s are immutable, so you always have to create new RDD from existing RDD, if you want to format/transform your data. There are various transformation API is available and the most commonly used are map(), flatMap(), reduceByKey()
  3. Actions on RDD: Transformation helps you convert your RDD from one format to another format or filter the data. But to get the result or do some calculations on the RDD you need to apply action. Here, very important concepts come into the picture is Transformations are lazy and only evaluated until you call action on the RDD.
  4. Caching and Persisting the RDD: Understand the difference between RDD caches and persist. What all options are available for that and how and which API you will be using for caching and persisting RDD. When and in which scenario it is useful and optimal to cache the RDD.
  5. Actions v/s Transformation: As we have already discussed Transformations are lazy and only evaluated once actions are called on the RDD. You will not get direct questions, what is the difference between Transformation and actions, but they will tweak some coding questions and you should be able to answer that question by understanding the concepts of transformation and actions.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MAPR Certified Spark Developer Syllabus Part-1