Big Data Analytics and Information Management
In this course, participants will learn about Big Data and the evolution of data processing technologies beyond data warehousing in the Big Data age. Participants will also learn about the integration of Big Data and Data Warehousing in an Analytical Ecosystem
What you can learn.
- Describe big data and big data technologies
- Explain the integration of big data and data warehousing in an analytical ecosystem
- Describe various data processing technologies and various languages and the appropriate use of each technology in an efficient analytical ecosystem
- Identify different workload management techniques and their role in workload distribution
- Illustrate the information architecture of an analytical ecosystem
- Compare and contrast analytics in Hadoop, database, and hybrid platforms
About this course:
This course introduces the student to the infrastructure and technologies to support Big Data analytics across an analytic ecosystem. The first half of the course will explore the history of data processing and analytics and the infrastructure technologies used to build scalable distributed systems in-house and in the cloud. The student will explore the evolution of open source software, open source projects, cloud services to enable the innovation of new analytic solutions, machine learning, and artificial intelligence in the enterprise. Analytic usage styles and approaches to bridging relational data with non-relational data will be reviewed to develop solutions that affect business outcomes. Multiple programming languages for analytics will be examined and compared, including Python, R, Scala, and SQL.
The second half of the course explores information management through data pipelines and evolutionary data architectures. The student will examine patterns for ingestion of data, transformation of data into schemas, merging data with other sources, and presentation of data for consumption. Management of data pipelines, data quality, data lineage, security, publishing, and governance. The student will experience working in a collaborative development environment to distribute and promote re-use of analytic algorithms. Contributing to open source communities and projects will also be explored.
It is advisable that you complete the following (or equivalent) since they are prerequisites for Big Data Analytics and Information Management.
The following courses are recommended prerequisites: COM SCI X 414.51 Relational Database Management, COM SCI X 450.1 Introduction to Data Science, COM SCI X 450.3 Hadoop and Managing Big Data, or consent of instructor.