While traditional methods may be proficient to collect and analyze uniform data, utilizing multiple structured and unstructured external data sources can be challenging. In this presentation at Strata Hadoop World, 2016, Joe Caserta explained how one of the largest membership interests groups in the country makes sense of the influx of information from streaming external data sources. This challenge is exciting because aside from collecting data from its ~40 million members, the group also needs to monitor digital and traditional interactions cohesively to predict and optimize a member’s path to purchase.
Path-to-purchase analytics is at the core of the solution to segment and individualize potential member interactions on- and offline and increase high-value member loyalty. Joe outlined the architecture of the ingestion, data lake, data science, and data warehouse components built on AWS and Spark and discussed how his team designed and implemented a data lake in S3, ETL in Spark, member matching with GraphFrames, and a DW in Redshift to help revolutionize the way this membership interest group uses its data to become an analytics-driven company. Attendees learned how organize data within the lake to encourage data science experimentation and create models to increase a lasting engagement with your members.