Prerequisite
- IntelliJ IDEA Community Edition
Walk-through
In this article, I am going to walk-through how to create and execute Apache Spark application to create first RDD(Resilient Distributed Dataset) in the IntelliJ IDEA Community Edition.Step 1: Create the sbt based Scala project for developing Apache Spark code using Scala API.
Step 2: Create the following two files in above created sbt based Scala project and execute the program to create first RDD(Resilient Distributed Dataset).
build.sbt
name := "apachespark101" version := "1.0" scalaVersion := "2.12.8" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
create_first_rdd_apachespark101_part_3.scala
package com.datamaking.apachespark101 import org.apache.spark.sql.SparkSession object create_first_rdd_apachespark101_part_3 { def main(args: Array[String]): Unit = { println("Started ...") val spark = SparkSession .builder .appName("Apache Spark 101 Tutorial | Part 1") .master("local[*]") .getOrCreate() spark.sparkContext.setLogLevel("ERROR") // Create RDD of odd numbers val numbers_odd_list = List(1, 3, 5, 7, 9) val numbers_odd_rdd = spark.sparkContext.parallelize(numbers_odd_list, 2) println("Printing Odd Numbers: ") numbers_odd_rdd.collect().foreach(println) // Create RDD of 1 to 10 numbers val numbers_list = 1 to 10 val numbers_rdd = spark.sparkContext.parallelize(numbers_list, 2) println("Printing 1 to 10 Numbers: ") numbers_rdd.collect().foreach(println) // Create RDD of 1 to 5 numbers(except number 5) val numbers_list_1 = List.range(1, 5) val numbers_rdd_1 = spark.sparkContext.parallelize(numbers_list_1, 2) println("Printing 1 to 5 Numbers(except number 5): ") numbers_rdd_1.collect().foreach(println) println(numbers_rdd_1.getClass.getSimpleName) val tech_names_list = List("Spark", "Hadoop", "Scala", "Python", "IoT", "DataScience") val tech_names_rdd = spark.sparkContext.parallelize(tech_names_list, 3) println("Printing Technology Names: ") tech_names_rdd.collect().foreach(println) spark.stop() println("Completed.") } }
Summary
In this article, we have successfully created and executed Apache Spark application to create first RDD(Resilient Distributed Dataset). Please go through all these steps and provide your feedback and post your queries/doubts if you have. Thank you. Appreciated.Happy Learning !!!
0 Comments