For the first time, WSO2 organized a virtual hackathon on 25th September, 2015. The target audience was big data enthusiasts around the world and a team can have an Architect and a developer. From the registered teams, only 10 teams selected.
The hackathon was based on the DEBS 2014 Grand Challenge, which the data were taken from real world scenario and the solutions were asked implement on top of the WSO2 data analytics platform.
Two queries were given to solve using WSO2 Complex Event Processor (CEP) and WSO2 Data Analytics Server (DAS). Both servers and their supportive servers ran as docker containers and the cluster management was done using Kubernetes. We deployed both CEP and DAS as applications in WSO2 Private PaaS. Following are the some of the specs used in the hackathon.
Following are the two queries that the teams had to solve during the 24 hour virtual hackathon.
Query 01 - Load Prediction
The goal of this query is to make load forecasts based on current load measurements and those of recorded historical data. Such forecasts can be used to proactively influence load and adapt it to the supply situation, e.g. current production of renewable energy sources.
You must use following algorithm for prediction and implement it using WSO2 CEP.
The query should forecast the load for each house. Starting from September 1, 2013, we group each 24 hours into 12X24 5 minutes slices. For example, at current time c (current event time) belongs to the slice floor(time_of_day(c)/300). Here Floor is the math:floor() function and time_of_day(timestamp) function, should return the number of seconds elapsed in the given day for the given timestamp. We predict load of time t+10 based on current time c which is falling in the time slice starting at t, where t is a multiple of 5 minutes.
The output streams for housebased prediction values should contain the following information:
- ts – timestamp of the starting time of the slice that the prediction is made for
- house_id – id of the house for which the prediction is made
- predicted_load – the predicted load for the time slice starting at ts
The output streams for plugbased prediction values should contain the following information:
- ts – timestamp of the starting time of the slice that the prediction is made for
- house_id – id of the house where the plug is located
- household_id – the id of the household where the plug is located
- plug_id – the id of the plug for which the prediction is made
- predicted_load – the predicted load for the time slice starting at ts
The output streams should be updated every 30 seconds as specified by the input event timestamp. The purpose of the update is to reflect the latest value of the avgLoad(s_i) for the given slice.
Query 02 Outlier
The goal of this query is to find devices that have very high data (outlying) readings. The calculation is done every 15 minutes, and given a time t( a multiple of 15 minutes) an outlier is a device that has power consumption of more than Mean(D[t,t15m] ) + 2*standarddeviation(D[t,t15m] )) where D[t,t15m] is data collected between time t15m and t across all devices in the system.You must use WSO2 DAS SparkSQL support to implement this query. For every 15m interval, output should be written to a database table in the following format:
- timestamp – timestamp of the time output generated
- house_id – house id
- household_id – the id of the household where the plug is located
- plug_id – device ID
- value – value of the reading