HadoopExam Blogs

HadoopExam Learning Resources

Spark Application: There are mainly two ways by which you can execute your Spark program, one is through interactive shell and other is by creating Spark Applications (Bundled Jars in case of Java and Scala). Interactive shell is good for implementing prototypes and checking basic functionality of the Spark Applications, finally in production you need to run Spark Applications. So you need to know, how to create Spark applications and what all things are needed. Let’s discuss each subtopic in detail.

  1. MapReduce jobs on YARN: Yes, in this case you need to know basics of Hadoop framework and MapReduce algorithm. If you are aware Hadoop framework was created based on two main concepts MapReduce (Compute) and HDFS (Storage). Both are distributed. So whatever, you can implement using Spark Framework API, most of that can be written in MapReduce as well. However, people are not writing MapReduce too much now, because it requires lot of complex coding. However, concepts wise you should still aware how MapReduce works on Hadoop Framework. Now another challenge here is MapReduce part of Hadoop had evolved and new Framework has been created which is known as MapReduce 2.0 or YARN. YARN (Yet another resource negotiator) is another framework, which not only supports MapReduce algorithm but others as well like Spark Jobs for parallel processing. So in the exam you may be asked questions like how the Spark jobs works when it is submitted on the YARN framework. I don’t expect too many questions on this.
  2. SparkContext: Whenever you need to submit an application to Spark, you need to know all the detail about entire cluster and its environment. So you need access to SparkContext. Spark provide the ability to create SparkContext instance for your application and you can use that instance during your application/job execution to get the details about Spark cluster. In interactive shell Spark provides you pre-created SparkContext object and that can be referred as an object “sc”
  3. Main Application: As I have mentioned previously, that you should be able to create Spark Applications using main method. If you are able to write Spark code in interactive shell, then it is not challenging. You should know basic concepts of Scala how a class can be created and how to define main () method in it. This is more about structuring the long source code in small units and then build a Jar using either Maven or Scala build tool (SBT). However, I am not expecting too many questions on that. You should have basic concepts clear how to create Spark Applications, using classes, objects and main method. The main point here is how you create SparkContext for your application. Because SparkContext is main entry point for the Spark Application in Spark framework. As you know in the Spark interactive shell object of SparkContext is already available as an object which is referred by a value/variable “sc” (Do you know the difference between value and variable in Scala?)
  4. Difference between Interactive shell and application: Nope, you may not be asked what the difference between these two. Rather as I mentioned what is available in Spark shell there are other variables also available in Spark shell which are HiveContext, SQLContext, and SparkContext etc. But when you submit your application, you should be able to create your own these objects. How to create these all objects. Very important point to remember is that when you will be creating HiveContext object, it is not necessary to have Hive and Hadoop framework in place. You can create HiveContext objects with Hadoop framework, even you should prefer to use HiveContext rather than SQLContext for writing SQL query.
  5. Application Run Mode: There are various cluster manager on which Spark can run its application. There are four main cluster manager Local, Standalone, YARN and Mesos. I would say among all these 4, YARN is most popular one. What exactly are the differences when you use one of these cluster manager and when you should consider in which scenario? I would say Local and standalone mode are more for testing and prototyping. You should be aware how YARN framework works and what happens when Spark job is submitted on YARN cluster. How the Spark Executor, Tasks, worker etc. co-ordinated.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MAPR Certified Spark Developer Syllabus Part-2