Distributed Computing with Spark SQL
This course is all about big data and distributed computing using Apache Spark. It is designed for students with SQL experience who want to take the next step on their data journey. Through four modules, students will gain a thorough understanding of the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines. They will also learn about storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. Additionally, students will explore new features in Apache Spark 3.x such as Adaptive Query Execution, connecting to databases, schemas and data types, file formats, and writing reliable data. Finally, they will learn about data lakes, data warehouses, and lakehouses, and build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will have honed their SQL and distributed computing skills to become more adept at advanced analysis. ▼
ADVERTISEMENT
Course Feature
Cost:
Free
Provider:
Coursera
Certificate:
Paid Certification
Language:
English
Course Overview
❗The content presented here is sourced directly from Coursera platform. For comprehensive course details, including enrollment information, simply click on the 'Go to class' link on our website.
Updated in [May 30th, 2023]
Introducing Distributed Computing with Spark SQL:
Are you looking to take your data journey to the next level? Distributed Computing with Spark SQL is the perfect course for you! This course is designed to help students with SQL experience gain a thorough understanding of Apache Spark, an open-source standard for working with large datasets. Through four modules, you will learn the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. You will also gain an understanding of the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines.
By taking this course, you will be able to hone your SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists. This course is also a great way to explore possible development paths in your career or education, as well as related learning suggestions. So, if you’re ready to take your data journey to the next level, Distributed Computing with Spark SQL is the perfect course for you!
Course Provider
Provider Coursera's Stats at AZClass
Discussion and Reviews
0.0 (Based on 0 reviews)
Start your review of Distributed Computing with Spark SQL