Checking for maximum length of consecutive days which satisfy specific condition

Go To StackoverFlow.com

5

I have a MySQL table with the structure:

beverages_log(id, users_id, beverages_id, timestamp)

I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:

CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;

CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;

SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;

However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?

2012-04-04 23:09
by Chris Garson


6

This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -

SELECT *
FROM (
    SELECT t.*, IF(@prev + INTERVAL 1 DAY = t.d, @c := @c + 1, @c := 1) AS streak, @prev := t.d
    FROM (
        SELECT DATE(timestamp) AS d, COUNT(*) AS n
        FROM beverages_log
        WHERE users_id = 1
        AND beverages_id = 1
        GROUP BY DATE(timestamp)
        HAVING COUNT(*) >= 5
    ) AS t
    INNER JOIN (SELECT @prev := NULL, @c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
2012-04-05 00:38
by nnichols
Yep, this works well - thanks! I'm not sure I fully understand the "INNER JOIN (SELECT @prev := NULL, @c := 1) AS vars" - is this just to clear the variables - Chris Garson 2012-04-05 16:42
It is just to make sure they don't have any values from a previous run of the query - nnichols 2012-04-05 16:51
Gotcha. One more question - how is it that @prev is actually referring to the date for the previous row? It seems that if it's defined as "@prev := t.d", it'll just give the date for the current row - Chris Garson 2012-04-05 20:12
It works because streak is evaluated before @prev is re-assigned. There is a warning in the Reference Manual that this order of evaluation should not be relied upon but it is widely used and unlikely to change - nnichols 2012-04-05 21:55
Ah - cool trick. Thanks again - Chris Garson 2012-04-06 15:09


0

Why not include user_id in they daycounts view and group by user_id and date.

Also include user_id in view t.

Then when you are queering against t add the user_id to the where clause.

Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.

2012-04-04 23:36
by Dan P


0

That's a little tricky. I'd start with a view to summarize events by day:

CREATE VIEW BView AS
    SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
    FROM beverages_log
    GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)

I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:

SELECT
    UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
    (
    SELECT
        B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
    FROM
        BView AS B1
        INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
        INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
    WHERE
            B1.NumEvents >= 5
        -- Exclude this potential streak if there's a day with no activity
        AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
        -- Exclude this potential streak if there's a day with less than five events
        AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
    ) AS X
GROUP BY
    UserID, BevID
2012-04-04 23:36
by NoName
As written, this counts 04/02 to 04/04 as a two-day streak, because the difference of the dates is two. If you think of this as three days, just add one. Actually using COUNT(*) rather than DATEDIFF() should give that result, too - NoName 2012-04-04 23:38
Ads