PySpark 101 Tutorial

A place to get started with Apache Spark Ecosystem Components with 101 hands-on tutorial which will help you to understand the concepts of Apache Spark Ecosystem Components in detail. Note: 101 hands-on tutorial is developed using Apache Spark with Python API which is PySpark(Python programming language).

Apache Spark APIs are available in Scala, Python, Java and R programming languages. Programmers from Scala, Python, Java and R can easily develop data pipeline, machine learning model using Apache Spark APIs.

Apache Spark

Apache Spark is a unified analytics engine for large-scale data processing.
Apache Spark ecosystem components/libraries are,
  • Spark Core API(RDD)
  • Spark SQL(SQL, DataFrame)
  • Spark Streaming, Spark Structured Streaming
  • MLlib/Spark ML(Machine Learning)
  • GraphX

PySpark 101 Tutorial


Happy Learning !!!

Post a Comment

0 Comments