spark-2.1.0 源码安装
前置
jdk1.7.0_79
scala-2.11.8
apache-maven-3.3.9
步骤
方法一
./build/mvn -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package
- 1export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512M -XX:MaxPermSize = 512M"
######指定scala版本
1./dev/change-scala-version.sh 2.11
方法二
./dev/make-distribution.sh –name 2.6.0-cdh5.7.0 –tgz -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0
修改/dev/make-distribution.sh里面的脚本
1.将VERSION ,SCALA_VERSION ,SPARK_HADOOP_VERSION ,SPARK_HIVE注释掉,直接写上自己的版本
#VERSION=$(“$MVN” help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v “INFO” | tail -n 1) 指的是spark2.1.0这个版本
#SCALA_VERSION=$(“$MVN” help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\ 指的是scala 2.11
# | grep -v “INFO”\
# | tail -n 1)
#SPARK_HADOOP_VERSION=$(“$MVN” help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\ 指的是hadoop.version=2.6.0-cdh5.7.0
# | grep -v “INFO”\
# | tail -n 1)
#SPARK_HIVE=$(“$MVN” help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\ SPARK_HIVE为1表示支持
# | grep -v “INFO”\
# | fgrep –count “
hive “;\# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# # because we use “set -o pipefail”
# echo -n)
将以下的内容贴在注释掉的那个脚本的后面即可
VERSION=2.1.0
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
坑0
问题1:[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.2.0: Failure to find org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.7.0 in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]
######这是因为默认的是apache的仓库,但是我们hadoop的版本写的是CDH,这时要将CDH的仓库配进来,打开spark目录下的pom.xml文件,将CDH的仓库配进去
vi /usr/local/spark-test/app/spark-2.2.0/pom.xml 添加如下
添加时注意区分空格和tab,格式不对会在mvn 编译时报错
坑一
Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-tags_2.11: wrap: java.io.IOException: Cannot run program “/home/c/hadoop/jdk1.7.0_79/jre/bin/java” (in directory “.”): error=13, Permission denied -> [
解决:将jdk1.7.0_79 整个文件夹 chmod -R 777
坑二
maven 编译时 报错:
Received fatal alert: handshake_failure还有ssl 提醒之类的错误
解决:将pom.xml
|
|
里的https 改成 http
参考链接
https://blog.csdn.net/chen_1122/article/details/77935149
https://blog.csdn.net/jiaotangX/article/details/78635133
https://blog.csdn.net/suisongtiao1799/article/details/80223068
http://feitianbenyue.iteye.com/blog/2429045
坑四
因为之前的命令里面没有编译安装hive模块,所以在用spark-sql 访问 hive 时报错(提示你需要build hive和hive-thriftserver 类似的信息)以及用spark-shell访问没反应(明明hive里面有表,运行spark.sql(“show tables”).show 不显示应有数据)
执行:
|
|
把hive 的依赖 加上去重新编译
重新编译过程中可能出现的问题
确保编译过程中有网