Course description
New data points are being generated at ever increasing rates. Traditional techniques based on relational databases to ingesting, storing, indexing, and analyzing the data are no longer sufficient to deal with the volume, variety, and velocity of new data points. The volume, variety, and velocity of new data points are creating bottlenecks at every stage of the processing chain. This course will present Big Data architecture for building distributed software systems. At the outer endpoint of the distributed system, there is a need to quickly validate the incoming data so as to maintain data quality. When storing the data, write latency can never exceed the tens of milliseconds for any real world application with a healthy user base. When indexing the data, the indexer throughput rate must be high enough to keep up with velocity increase of the incoming data. The indexing technique must support logical operators, wildcards, geolocation, join, and aggregate queries. Once the data is stored and indexed, we are faced with other challenges related to near real-time predictive analytics. The issue for near real time analytics is how quickly we can take advantage of new data points after they are stored in the system to answer a question. This requires that the duration of the workflow required to ingest, store, index, and analyze the data be kept to a minimum. Even after all these requirements are met, there is one additional requirement. The above system must be schema less. That is, the system must support extensibility of its own data models and the addition of new data models without any new programming.
Schedule
Week 1: Jan 11, 2025
Introduction to Big Data architecture
Week 2
Strongly typed data protocols
Week 3
Key value stores& Rest APIs
Week 4:
Json as Graph
Week 5: Feb 8, 2025
Prototype demo 1
Week 6
Securing rest API
Week 7
Oauth 2.0
Week 8 Search Part 1
Week 9 March 8, 2025 Spring Break
Week 10: March 15, 2025
Prototype demo 2
Week 11
Search part2, Queuing
Week 12:
Introduction to GraphQL
Week 13:
Selected Topics
Week 14: April 12
Final Prototype demo
Week 15 :April 19