...

Big Data - Apache Oozie

Back to Course

Lesson Description


Lession - #747 Apache Oozie Bundle


What's Apache Oozie Bundle?

Apache Oozie Bundle is a collection of Oozie coordinator applications that include the direction when starting that coordinator. The user will be able to start, stop, suspend, renew and rerun in the bundle level which gives a better and easy functional control. We can define Bundle using an XML- based language that's called Bundle Specification Language. It's a very useful level of abstraction in numerous large enterprises.

Why use Apache Oozie Bundles?

Let us see the use case of an Internet company that makes its revenue through advertising and ad clicks.

There's a workflow that's used to count ad clicks and then it calculates the cost to the advertiser account IDs and publishes the generated revenue feed. It works every 15 minutes and is called revenue workflow.

There would be a Targeting WF that looks at the user IDs corresponding to the ad clicks and does some processing to segment them for behavioral AD targeting. This workflow is managed by other teams that are different from revenue workflow in terms of business requirements. It triggers every 15 minutes.

There would be an Hourly workflow called the AD- UI WF that rolls up the 15- minute revenue feeds generated by the revenue WF and pushes a feed to an operational database that feeds an advertiser user interface. The user interface is used by advertisers to check ad payment.

There would be a Reporting WF that runs daily in the morning to aggregate a lot of the data from the previous day and generate daily canned reports for the executives of the company.

At last, there would be advertiser billing logic and the SOX( Sarbanes – Oxley>
compliance checks run monthly because that’s when the larger advertisers get a bill and are expected to pay. They do n’t pay daily or hourly. This makes up the Billing WF and involves monthly aggregations and rollups.

Apache Oozie Bundle State Transitions



Oozie bundle jobs are in one of the following status PREP, RUNNING, RUNNINGWITHERROR, SUSPENDED, PREPSUSPENDED, SUSPENDEDWITHERROR, PAUSED, PAUSEDWITHERROR, PREPPAUSED, SUCCEEDED, DONEWITHERROR, KILLED, FAILED at any given time. The state name of bundles is transparent and very similar to workflow and coordinator.