Hadoop FileAlreadyExistsException when creating new SparkContext
Even though having run successfully a spark program a few dozen times, after the latest sbt package the latest run is having a FileAlreadyExistsException upon starting the SparkContext:
Note: I had run sbt clean package assembly first.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileAlreadyExistsException
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:119)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.security.Groups.<init>(Groups.java:64)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:257)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:234)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:749)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:734)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:607)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2154)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2154)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2154)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:301)
at mllib.perf.TestRunner$.main(TestRunner.scala:27)
val sc = new SparkContext(new SparkConf().setAppName("TestRunner: " + testName))
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.FileAlreadyExistsException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
Whatever Hadoop temp and Input/Output directories are being created are within the control of Spark and not performed by user code. So then how to address this?
Looking at the error my guess would be you need a clean build package assembly since it isn't even able to find the FileAreadyExists exception