How to use cartesian RDD transformation in PySpark | PySpark 101

Prerequisite

Apache Spark
PyCharm Community Edition

Walk-through

In this article, I am going to walk-through you all, how to use cartesian RDD transformation in the PySpark application using PyCharm Community Edition.

cartesian: cartesian RDD transformation does the multiplication of two sets to form the set of all key-value pairs. The first element of the ordered pair belong to first set and second pair belong the second set. For an example, Suppose, A = {rat, cat} B = {nuts, milk} then, A×B = {(rat,nuts), (cat,milk), (rat,milk), (cat,nuts)}

# Importing Spark Related Packages
from pyspark.sql import SparkSession

# Importing Python Related Packages
import time

if __name__ == "__main__":
    print("PySpark 101 Tutorial")
    print(time.strftime('%Y-%m-%d %H:%M:%S'))

    # cartesian - Return the Cartesian product of this RDD and another one, that is,
    # the RDD of all pairs of elements (a, b) where a is in self and b is in other.

    spark = SparkSession \
            .builder \
            .appName("Part 18 - How to use cartesian RDD transformation in PySpark | PySpark 101") \
            .master("local[*]") \
            .enableHiveSupport() \
            .getOrCreate()

    number_list_1 = [1, 2, 3]
    print("Printing number_list_1: ")
    print(number_list_1)

    number_list_2 = [4 ,5, 6, 7]
    print("Printing number_list_2: ")
    print(number_list_2)

    number_rdd_1 = spark.sparkContext.parallelize(number_list_1)
    number_rdd_2 = spark.sparkContext.parallelize(number_list_2)

    cartesian_number_rdd = number_rdd_1.cartesian(number_rdd_2)
    print("Printing cartesian_number_rdd: ")
    print(cartesian_number_rdd.collect())

    print("Aggregation Result: ")
    print(cartesian_number_rdd.reduceByKey(lambda a,b: a + b).collect())

    print("Stopping the SparkSession object")
    spark.stop()

Summary

In this article, we have successfully used cartesian RDD transformation in the PySpark application using PyCharm Community Edition. Please go through all these steps and provide your feedback and post your queries/doubts if you have. Thank you. Appreciated.

Happy Learning !!!

How to use cartesian RDD transformation in PySpark | PySpark 101 | Part 18

Prerequisite

Walk-through

Summary

Post a Comment

0 Comments

Labels

Contact Us

All Blog Posts

Popular Posts

How to use cartesian RDD transformation in PySpark | PySpark 101 | Part 18

Prerequisite

Walk-through

Summary

You may like these posts

Post a Comment

0 Comments

Labels

Contact Us

All Blog Posts

Popular Posts