Packt Publishing – Real Time Streaming using Apache Spark Streaming

Packt Publishing – Real Time Streaming using Apache Spark Streaming
English | Size: 208.13 MB
Category: CBTs

Spark is the technology that allows us to perform big data processing in the MapReduce paradigm very rapidly, due to performing the processing in memory without the need for extensive I/O operations.

Recently, the streaming approach to processing events in near real time became more widely adopted and more necessary. In this course, you will learn how to handle big amount of unbounded infinite streams of data. You will analyze data and draw conclusions from it. Furthermore, we will look at common problems when processing event streams: sorting, watermarks, deduplication, and keeping state (for example, user sessions). You will also implement streaming processing using Spark Streaming and analyze traffic on a web page in real time.

What You Will Learn
• Implement stream processing using Apache Spark Streaming
• Consume events from the source (for instance, Kafka), apply logic on it, and send it to a
data sink.
• Understand how to deduplicate events when you have a system that ensures at-least-once
• Learn to tackle common stream processing problems.
• Create a job to analyze data in real time using the Apache Spark Streaming API.
• Master event time and processing time
• Single event processing and the micro-batch approach to processing events
• Learn to sort infinite event streams

About WoW Team

I'm WoW Team , I love to share all the video tutorials. If you have a video tutorial, please send me, I'll post on my website. Because knowledge is not limited to, irrespective of qualifications, people join hands to help me.