Prerequisite
- IntelliJ IDEA Community Edition
Walk-through
In this article, I am going to walk-through how to use map RDD(Resilient Distributed Dataset) transformation with hands-on example in the Apache Spark application with Scala API on IntelliJ IDEA Community Edition.Step 1: Create the sbt based Scala project for developing Apache Spark code using Scala API.
Step 2: Create the following two files in above created sbt based Scala project and execute the program to use map RDD(Resilient Distributed Dataset) transformation.
build.sbt
name := "apachespark101" version := "1.0" scalaVersion := "2.12.8" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"
map_rdd_transf_apachespark101_part_4.scala
package com.datamaking.apachespark101 import org.apache.spark.sql.SparkSession object map_rdd_transf_apachespark101_part_4 { def main(args: Array[String]): Unit = { println("Started ...") val spark = SparkSession .builder .appName("Apache Spark 101 Tutorial | Part 1") .master("local[*]") .getOrCreate() spark.sparkContext.setLogLevel("ERROR") val numbers_list = 1 to 10 print(numbers_list.getClass.getSimpleName) val numbers_rdd = spark.sparkContext.parallelize(numbers_list, 3) val numbers_mul_by_5_rdd = numbers_rdd.map(e => e * 5) println("Printing Numbers which are multiplied by 5: ") numbers_mul_by_5_rdd.collect().foreach(println) val apache_spark_list = List("Apache Spark is in-memory distributed framework.") val apache_spark_rdd = spark.sparkContext.parallelize(apache_spark_list) val apache_spark_map_rdd = apache_spark_rdd.map(ele => ele.split(" ")) println("Printing the result: ") apache_spark_map_rdd.collect().foreach(e => e.foreach(println)) spark.stop() println("Completed.") } }
Summary
In this article, we have successfully created and executed Apache Spark application and learned how to use map RDD(Resilient Distributed Dataset) transformation. Please go through all these steps and provide your feedback and post your queries/doubts if you have. Thank you. Appreciated.Happy Learning !!!
1 Comments
Do you have the code in Git Repository? If Yes, Can you share the code for reference?
ReplyDelete