When to use aggregateByKey RDD transformation in PySpark | PySpark 101

Prerequisite

Apache Spark
PyCharm Community Edition

Walk-through

In this article, I am going to walk-through you all, how to use aggregateByKey RDD transformation in the PySpark application using PyCharm Community Edition.

aggregateByKey: Mostly aggregateByKey RDD transformation is used when you want to perform two type of aggregate key on the key and value pair. Here we are doing two type of aggregations i.e. sum and count on the key and value pair.

# Importing Spark Related Packages
from pyspark.sql import SparkSession

# Importing Python Related Packages

if __name__ == "__main__":
    print("PySpark 101 Tutorial")

    spark = SparkSession \
            .builder \
            .appName("Part 15 - When to use aggregateByKey RDD transformation in PySpark | PySpark 101") \
            .master("local[*]") \
            .enableHiveSupport() \
            .getOrCreate()

    key_value_pair_list = [("CSK", 3), ("MI", 2), ("CSK", 2), ("MI", 3), ("RCB", 2), ("CSK", 3)]
    print("Printing key_value_pair_list: ")
    print(key_value_pair_list)

    key_value_pair_rdd = spark.sparkContext.parallelize(key_value_pair_list, 2)

    print("Get Partition Count: ")
    print(key_value_pair_rdd.getNumPartitions())

    print(key_value_pair_rdd.aggregateByKey((0.0, 0),
                                            lambda x, y: (x[0] + y, x[1] + 1),
                                            lambda rdd1, rdd2: (rdd1[0] + rdd2[0], rdd1[1] + rdd2[1])
                                            ).collect())

    print("Stopping the SparkSession object")
    spark.stop()

Summary

In this article, we have successfully used aggregateByKey RDD transformation in the PySpark application using PyCharm Community Edition. Please go through all these steps and provide your feedback and post your queries/doubts if you have. Thank you. Appreciated.

Happy Learning !!!

When to use aggregateByKey RDD transformation in PySpark | PySpark 101 | Part 14

Prerequisite

Walk-through

Summary

Post a Comment

0 Comments

Labels

Contact Us

All Blog Posts

Popular Posts