Certificate Program on
Big Data Engineering
Leveraging Cloud for Big Data Analytics
3 Weeks Live Sessions | 4th Weekend Campus Immersion
Big Data Engineering
Certification Bootcamp Focus
Learning Outcomes
Outline the basic concepts of Big Data and related technologies, and apply them to analyze general use cases and those related to their organizations
Compare and select the Big Data Infrastructure services from the major Cloud Service Providers to use them for enterprise data management and analysis
Describe main properties of the SQL and NoSQL databases, select appropriate database type depending on data and analysis
Outline the major components and processes of the Enterprise Data Governance Architecture and corresponding organizational roles; develop the company’s Data Management Plan (DMP) and a corresponding implementation plan
Select, assess, and deploy Hadoop or Spark cluster on one of cloud platforms (Azure HDInsight, Amazon EMR, Google Cloud Platform, or others); become acquainted with functionality and programming model of the main Hadoop ecosystem components MapReduce, Spark, HBase, Hive, Pig, Kafka, others; program simple tasks using one of scripting or programming languages (e.g. Hive SQL, Pig Latin, Python, Java)
Outline the main security and privacy challenges in using Big Data technologies; apply industry best practices and existing applications to protect companies’ data and customers personal data.
Why this Bootcamp
Work on Real-life Data Science Problems
Take your career headon by working on projects using a competency based learning paradigm. Quality of time spent and the outcome is far more important than the quantity.
Work 1:1 with a Mentor
We pair you with a mentor who has extensive professional and academic knowledge of the field. You’ll have one-on-one conversations with your mentor, and receive useful feedback on improving your work.
We Will Keep You Engaged
Our mentors are here to keep you motivated, answer questions, provide feedback, and help deepen your understanding of essential tools and techniques. Learn with live online classes and face to face sessions. Learning is best when you are able to ask the questions and clarify your doubts with the faculty.
What You Will Learn
Unit 1: Cloud Computing Foundation
■ Cloud Service Models and Operation, Cloud Resources, Multitenancy
■ Virtual Hybrid/Dynamic Cloud Datacenter, and outsourcing enterprise IT infrastructure to Cloud
■ Cloud use cases and scenarios for enterprise
■ Cloud Economics and Pricing Model
Unit 2: Cloud and Big Data, Big Data Infrastructure and Components
■ Overview of major Cloud based Big Data Platforms: AWS, Microsoft Azure, Google Cloud Platform (GCP). Introduction into MapReduce/Hadoop
■ Hadoop Ecosystem and Components
■ HDFS and Cloud Based File Systems
■ HBase, Hive and Pig, YARN MapReduce/Hadoop Programming and Tools
Unit 3: SQL and NoSQL Databases
■ SQL basics (recollection from Database and SQL course)
■ NoSQL Databases types and overview
■ Column based databases and use (e.g.HBase)
■ Modern large scale databases AWS Aurora, Azure CosmosDB, Google Spanner
Unit 4: Data Streams and Streaming Analytics
■ Data Streams and Stream Analytics
■ Spark Architecture and Components
■ Popular Spark platforms, DataBricks, Spark Programming and Tools
Unit 5: Big Data Management and Security
■ Enterprise Big Data Architecture and Large Scale Data Management
■ Data Structures, Data Warehouses. Distributed Systems
■ CAP Theorem, ACID and BASE Properties
■ Cloud Based Services, Data Lakes
■ Big Data Security challenges, Data Protection
■ Access Control and Identity Management
Sample Projects
1. Run MapReduce tasks, e.g. word count; run a ranking algorithm, run graph Pregel (shortest path) algorithm.
2. For an enterprise profile select and suggest the enterprise Big Data Infrastructure, services and components. Also create a Data Management Plan (DMP) and cost assessment and deployment plan.
Projects and Skillathons
Group project on enterprise Big Data infrastructure: Data Management Plan (DMP), Cost assessment and deployment plan, Security and compliance issues, data protection- Capstone Skillathon
Learn to work with Amazon Web Services cloud; cloud services overview EC2, S3, VM instance deployment, and access.
Run MapReduce tasks, e.g. word count; run simple ranking an algorithm, run graph Pregel (shortest path) algorithm (individual assignment)
Work with Big Data analytics services, Deploy and run HDInsight Hadoop cluster, test HBase and Hive queries, and run simple data analysis tasks.
Ability to be learn hands on with real industry data and delivering insights to industry jury is the best part of the program. Data Science and its application for Decision Science with practitioner faculty is the biggest highlight of the program. Strongly Recommend it.
Vinod Tiwari
Senior Analyst, TCS
Facilitators
Prof.Yuri Demchenko
University of Amsterdam
Prof. Yuri from University of Amsterdam, Netherlands, is an internationally recognized expert on Big Data, Cloud Computing, Application Security and has published in various international Data Science Journals as an educator and as an industry practitioner. He is a member of the NIST Big Data Working Group and Project leader for the prestigious Project EDISON H2020. As a coach and faculty at Institute of Product Leadership he teaches courses in Big Data in MBA in Applied Data Science
Is this program right for you ? Get the advice from a Senior Counselor
Big Data Engineering