Wednesday, 8 June 2016

Challanges working with sbt build tool



sbt is a build tool(like maven) developed in scala program. Nowadays sbt is widely used to build scala application. The common challange in using the sbt is you often get below error. The reason is that if you have two dependencies with same file in it, sbt will error out such conflicts by default. The default merge strategy is configured as "MergeStrategy.deduplicate".

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /home/cloudera/.ivy2/cache/org.eclipse.jetty.orbit/javax.servlet/orbits/javax.servlet-3.0.0.v201112011016.jar:javax/servlet/Filter.class
[error] /home/cloudera/.ivy2/cache/org.mortbay.jetty/servlet-api/jars/servlet-api-2.5-20081211.jar:javax/servlet/Filter.class


In order to resolve the conflicts, you have 2 options.

Exclude Jars
You can exclude the unwanted inner dependencies which is causing the conflits and keep only one.  But it is really difficult to find the inner dependencies and exclude.You can follow this approach if you are really sure about the dependencies. For more clarity on dependencies on your package, you can refer the mvnrepository site and exclude the inner dependencies like below:


libraryDependencies ++= Seq(
    ("org.apache.spark" %% "spark-hive" % "1.3.1").exclude("com.twitter", "parquet-hadoop-bundle").exclude("org.apache.avro", "avro-ipc").exclude("com.twitter", "parquet-format") ,
     "org.apache.spark" %% "spark-streaming" % "1.3.1",
)



Merge Strategy
You can choose your merge strategy based on your conflicts to discard, use the first one or use the last one as below.

Suppose you are getting below conflicts :

[error] (*:assembly) deduplicate: different file contents found in the following:
[error] /home/cloudera/.ivy2/cache/org.eclipse.jetty.orbit/javax.servlet/orbits/javax.servlet-3.0.0.v201112011016.jar:javax/servlet/Filter.class
[error] /home/cloudera/.ivy2/cache/org.mortbay.jetty/servlet-api/jars/servlet-api-2.5-20081211.jar:javax/servlet/Filter.class

inorder to set the strategy for the conflicts "javax/servlet/Filter.class", you pass path keyword 'javax' , 'servlet' to the PathList and select MergeStrategy.discard , MergeStrategy.first or MergeStrategy.last based on your choice. If you have many keyword, pas as many keywords to PathList so that it can uniquely identify the conflict resources. You can also use "startsWith" to identify the conflicts.


mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case x if x.startsWith("META-INF/maven/com.fasterxml.jackson.core") => MergeStrategy.last
    case x if x.startsWith("META-INF/maven/commons-logging") => MergeStrategy.last
    case x if x.startsWith("META-INF/ECLIPSEF.RSA") => MergeStrategy.last
    case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
    case x if x.startsWith("plugin.properties") => MergeStrategy.last

    case PathList("javax", "servlet", xs@_ *) => MergeStrategy.discard
    case x => old(x)
  }

   
Reference : http://stackoverflow.com/questions/14791955/assembly-merge-strategy-issues-using-sbt-assembly

1 comment: