This week, I was presented with an interesting task: Determine concurrent user metrics for one of the applications I support. While this sounds like a simple metric to gather, the actual implementation is pretty interesting.
Our user session history is stored in a database table that looks like:
start_date end_date 18-oct-2013 09:18:32.132 18-oct-2013 09:36:12.045 18-oct-2013 09:25:34.267 18-oct-2013 09:45:10.988 18-oct-2013 09:37:36.002 18-oct-2013 10:22:07.324 18-oct-2013 10:12:45.723 18-oct-2013 10:20:55.229 ...
There are other columns as well, but they're not really relevant to this exercise.
So what we have is events with defined start and end times in different fields, and we want to use that to determine how these records overlap in time.
Step 1: Identify the Sample Set
Let say we want to get average and peak concurrency for a specific time frame. For this example, lets use Oct 18th, 2013. This gives us a 24-hour span to work with.
Now, to filter down to the appropriate records for this timeframe, it's tempting to simply window it with:
SELECT * FROM session_history WHERE start_date BETWEEN '18-oct-2013' AND '19-oct-2013';
However, this isn't actually the window we want. Why? Well, I think it's easier to visualize:
The orange bars represent active user sessions over time. The green area is the timeframe we care about. What happens if we use the filter above to get our initial dataset? Well, we end up throwing out any sessions that started before our sample range, but extended into it. The same would happen if you used the
end_date instead of the
start_date. This means our concurrency numbers can report lower than they actually are.
So to properly select our sample set, we need to pull records that fulfill the following conditions:
- The session ends after the beginning of our sample range
- The session begins before the end of our sample range
The resultant SQL looks like:
SELECT * FROM session_history WHERE start_date < '19-oct-2013' AND end_date > '18-oct-2013';
This gives us any session that was active during the sample range. Now we start the transform:
Step 2: Create Separate Start/End Events
The next thing we want to do is split each session into two separate start and end events. We're going to do this by taking our sample set, and unioning it to itself:
WITH sessions AS ( SELECT * FROM session_history WHERE start_date < '19-oct-2013' AND end_date > '18-oct-2013') SELECT 1 AS event, start_date AS event_date FROM sessions begin UNION ALL SELECT -1 AS event, end_date AS event_date FROM sessions end;
When run against the sample set above, you end up with:
event event_date 1 18-oct-2013 09:18:32.132 1 18-oct-2013 09:25:34.267 -1 18-oct-2013 09:36:12.045 1 18-oct-2013 09:37:36.002 -1 18-oct-2013 09:45:10.988 1 18-oct-2013 10:12:45.723 -1 18-oct-2013 10:20:55.229 -1 18-oct-2013 10:22:07.324
Now, you might be seeing where this is going. We have separate, timestamped events for every session start and end, and a column which has '1' for a session start, and '-1' for a session end.
Step 3: Add an Aggregate
So all the magic is going to happen using that
event column we just created. Think of it like a counter. When a session starts, we add 1 to the active session total. When a session ends, we subtract 1 from the total. So now we simply have to start keeping count. We can do this with an aggregate function:
WITH sessions AS ( SELECT * FROM session_history WHERE start_date < '19-oct-2013' AND end_date > '18-oct-2013') SELECT q.*, SUM( event ) OVER ( ORDER BY event_date ASC ) AS concurrent FROM (SELECT 1 AS event, start_date AS event_date FROM sessions begin UNION ALL SELECT -1 AS event, end_date AS event_date FROM sessions end) q;
I should mention that all SQL in this article is written for Oracle. Function syntax may vary depending on your flavor of SQL.
Okay, what the heck is this thing doing now? We're taking the result set of the last step, which has separate start/end events, indicated with 1 and -1 respectively. Then, instead of using the typical aggregate implementation of
SUM(), we're adding an analytic clause. This causes it to keep the record set intact, and append a rolling sum as a new column. The
ORDER BY distinction tells it how it should order the set before doing summation.
When run against our sample events, the output is as follows:
event event_date concurrent 1 18-oct-2013 09:18:32.132 1 1 18-oct-2013 09:25:34.267 2 -1 18-oct-2013 09:36:12.045 1 1 18-oct-2013 09:37:36.002 2 -1 18-oct-2013 09:45:10.988 1 1 18-oct-2013 10:12:45.723 2 -1 18-oct-2013 10:20:55.229 1 -1 18-oct-2013 10:22:07.324 0
Pretty cool, eh? We've generated very accurate concurrency numbers that report as often as our users log in and out. Now, it's just a matter of determining the aggregates we were initially looking for.
Step 4: Re-Filter and Calculate Final Metrics
We're on the home stretch. We've got our concurrency numbers, and now we just need to get an
MAX(). Almost. Remember how way back at the beginning we made sure to include events that etended outside of our sample range? Well, now that we've got our concurrency numbers, we don't need them anymore. So when applying our aggregates, we also want to filter out any events that fall outside of our timeframe. (This is necessary, as they won't actually be accurate, and will skew our final calculations)
So our final SQL is going to look like this:
WITH sessions AS ( SELECT * FROM session_history WHERE start_date < '19-oct-2013' AND end_date > '18-oct-2013') SELECT AVG( concurrent ) AS avg_concurrent, MAX( concurrent ) AS max_concurrent FROM (SELECT q.*, SUM( event ) OVER ( ORDER BY event_date ASC ) AS concurrent FROM (SELECT 1 AS event, start_date AS event_date FROM sessions begin UNION ALL SELECT -1 AS event, end_date AS event_date FROM sessions end) q) WHERE event_date BETWEEN '18-oct-2013' AND '19-oct-2013';
Which when applied to our sample dataset results in:
avg_concurrent max_concurrent 1.25 2
We could also apply a grouping to our aggregate (Say, by hour) and see a hourly trend of concurrent user sessions. It's relatively easy to modify this query to answer quite a lot of questions. You really only need to adjust the date filters, and what kind of aggregate you want at the end. The rest is essentially a template.
This query is also quite re-usable. You can apply it to just about anything transactional that you might want to trend. As long as you have a start/end time, (or just a start time and duration) you can use this to determine concurrency.