Practical RDD transformation: map | Apache Spark 101 Tutorial | Scala | Part 4


Prerequisite

  • IntelliJ IDEA Community Edition

Walk-through

In this article, I am going to walk-through how to use map RDD(Resilient Distributed Dataset) transformation with hands-on example in the Apache Spark application with Scala API on IntelliJ IDEA Community Edition.

Step 1: Create the sbt based Scala project for developing Apache Spark code using Scala API.

Step 2: Create the following two files in above created sbt based Scala project and execute the program to use map RDD(Resilient Distributed Dataset) transformation.

build.sbt

name := "apachespark101"

version := "1.0"

scalaVersion := "2.12.8"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.4"


map_rdd_transf_apachespark101_part_4.scala

package com.datamaking.apachespark101

import org.apache.spark.sql.SparkSession

object map_rdd_transf_apachespark101_part_4 {
    def main(args: Array[String]): Unit = {
      println("Started ...")
      val spark = SparkSession
        .builder
        .appName("Apache Spark 101 Tutorial | Part 1")
        .master("local[*]")
        .getOrCreate()

      spark.sparkContext.setLogLevel("ERROR")

      val numbers_list = 1 to 10
      print(numbers_list.getClass.getSimpleName)
      val numbers_rdd = spark.sparkContext.parallelize(numbers_list, 3)
      val numbers_mul_by_5_rdd = numbers_rdd.map(e => e * 5)
      println("Printing Numbers which are multiplied by 5: ")
      numbers_mul_by_5_rdd.collect().foreach(println)

      val apache_spark_list = List("Apache Spark is in-memory distributed framework.")
      val apache_spark_rdd = spark.sparkContext.parallelize(apache_spark_list)
      val apache_spark_map_rdd = apache_spark_rdd.map(ele => ele.split(" "))
      println("Printing the result: ")
      apache_spark_map_rdd.collect().foreach(e => e.foreach(println))

      spark.stop()
      println("Completed.")
    }
}

Summary

In this article, we have successfully created and executed Apache Spark application and learned how to use map RDD(Resilient Distributed Dataset) transformation. Please go through all these steps and provide your feedback and post your queries/doubts if you have. Thank you. Appreciated.

Happy Learning !!!

Post a Comment

1 Comments

  1. Do you have the code in Git Repository? If Yes, Can you share the code for reference?

    ReplyDelete
Emoji
(y)
:)
:(
hihi
:-)
:D
=D
:-d
;(
;-(
@-)
:P
:o
:>)
(o)
:p
(p)
:-s
(m)
8-)
:-t
:-b
b-(
:-#
=p~
x-)
(k)