Training: Data science with Spark

Fast and high-performance: Optimize your analyses with Spark

Do you work with complex machine learning models and large amounts of data? Then Apache Spark as a cluster computing engine with in-memory calculations is your performance boost. Spark enables you to perform data queries even in big data environments and is one of the leading analytics technologies due to its machine learning libraries and numerous interfaces.

The consolidation of different data sources, interactive analyses or real-time data: Spark processes large amounts of data quickly and in parallel, thus optimally supporting even complex machine learning algorithms.

In our training course „Data science with Spark“ we teach you the basics for your work with Spark and focus on the interaction of Spark with the data science languages Python and R. We recommend this course for experienced Python and R users and beginners alike.

Data science with Spark: Course contents

  • Reading and repartitioning of data on a Spark cluster
  • Introduction to data management
  • Exchange between local R/Python sessions and cluster operations
  • Introduction to machine learning with Spark

Key facts

Programming language

You have the choice: We offer this training with a focus on Python (PySpark) or R (sparklyr).


The recommended course length is one day. We would also be happy to put together an individual training for you.


This course is offered at your desired date - in-house or as remote training. The course can also be held in English.

Let us prepare your offer.
Your contact person: Meltem Hekim

Portraitfoto Meltem Hekim