How To Install Apache Spark On Ubuntu 20.04 LTS

Apache Spark is a free & open-source framework. It is used for distributed cluster-computing system & big data workloads. It is a engine for large-scale data processing & provides high-level APIs compatible in Java, Scala & Python

 Install Apache Spark On Ubuntu

Update the system.

apt-get update

Install Java.

apt-get install openjdk-11-jdk

Check Java version.

java --version

Here is the command output.

openjdk 11.0.11
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)

Install Scala

apt-get install scala

Check Scala version.

scala -version

Here is the command output.

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Login to Scala.


Here is the command output.

elcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.11).
Type in expressions for evaluation. Or try :help.


Run the command.

scala> println("Hello World")
Hello World

Install Apache Spark

Download the file.

curl -O

Extract the downloaded file.

tar xvf spark-3.1.1-bin-hadoop3.2.tgz

Change the location of download extract file.

mv spark-3.1.1-bin-hadoop3.2/ /opt/spark 

Open bashrc configuration file.

vim ~/.bashrc

Add the following lines:

export SPARK_HOME=/opt/spark

Activate the bashrc file.

source ~/.bashrc

Start a master server. 

Here is the command output.

starting org.apache.spark.deploy.master.Master, logging to 

Open port number 8080 on ufw firewall.

ufw allow 8080/tcp

Access Apache Spark web-interface.



Here is the output.

Fig 2


Start the worker process spark://ubuntu:7077

Use Spark shell.


Use pyspark for python.



Leave a Reply