# Excluding weekends from time differences

Mar 27, 2014

Recently, the Looker Analytics team was posed with a unique challenge by a client: come up with a way to exclude weekends from an analysis that calculates the duration between two times.

The data in question captures the rate at which jobs move from various stages in production. Each row of the table of interest (hereinafter velocity) represents a single job — which, along with other attributes, has an approved and a completed time.

velocity table

job_id created_at approved_at completed_at ... location_id partner_id
1 2013-07-31 04:00:00 2013-08-01 11:00:00 2013-08-04 13:00:00 ... 1 1
2 2013-03-07 07:00:00 2013-03-07 10:00:00 2013-03-12 14:00:00 ... 4 1
... ... ... ... ... ... ...
n 2013-05-07 09:00:00 2013-05-08 02:00:00 2013-05-16 12:00:00 ... 3 2

The first pass at addressing approved-to-complete velocity relied on a simple date difference:

```    SELECT

job_id

,

approved_at

,

published_at

,

EXTRACT

(

'EPOCH'

FROM

(

completed_at

-

approved_at

))

/

3600

AS

velocity_in_hours

FROM

velocity

```

The problem with this approach is that some jobs start in one week, extend over one or more weekends, and are completed in some subsequent week; however, work is not done on weekends, and the velocity for jobs that extend over one or more weekends is penalized in a sense. This is why Looker was asked to come up with a solution to exclude weekends from the analysis.

## Solution

A few approaches were posed, but the final solution relied on a look-up table containing three columns: a date_time field, a day_of_week field, and an integer, hour_seq, indicating the number of hours since the first hour in this table (hereinafter sequence). The table would be used to get a corresponding integer (again, the number of hours since the beginning of the sequence table) for the approved_at and completed_at for each job and then subtract the two, yielding the number of hours between the two times.

The key to the solution was that the integers in the sequence table would not increment on weekend hours. The resulting table would look like this:

sequence table

date_time day_of_week hour_seq
2008-01-01 00:00:00 2 1
2008-01-01 01:00:00 2 2
2008-01-01 02:00:00 2 3
... ... ...
2008-01-04 21:00:00 5 93
2008-01-04 22:00:00 5 94
2008-01-04 23:00:00 5 95
2008-01-05 00:00:00 6 95
2008-01-05 01:00:00 6 95
2008-01-05 02:00:00 6 95
... ... ...
2008-01-06 23:00:00 0 95
2008-01-07 00:00:00 1 96
2008-01-07 01:00:00 1 97

To achieve this, we needed to start with a table of hours from the beginning of time (or 2008, when the business started) going well into the future. We relied on recursive common table expressions (CTEs) in PostgreSQL to generate a sequence of numbers and then add hours accordingly, but this could achieved in a number of ways for dialects without recursive CTEs.

```    WITH

RECURSIVE

seq

(

n

)

AS

(

SELECT

1

UNION

ALL

SELECT

n

+

1

FROM

seq

)

--generate a sequence of numbers

,

hours

AS

(

SELECT

TIMESTAMP

'2008-01-01 00:00:00'

+

n

*

INTERVAL

'1 hour'

AS

date_time

FROM

seq

LIMIT

87600

)

--generate ten years of hours from 2008-01-01 00:00:00 into the future

SELECT

date_time

,

EXTRACT

(

dow

FROM

date_time

)

AS

day_of_week

,

SUM

(

CASE

WHEN

EXTRACT

(

dow

FROM

date_time

)

NOT

IN

(

6

,

0

)

THEN

1

ELSE

NULL

END

)

OVER

(

ORDER

BY

date_time

)

AS

hour_seq

FROM

hours

```

The penultimate step was to twice join the sequence table into velocity and map each created_at and completed_at entry to its corresponding hour_seq. The final velocity table looked something like this:

velocity table

job_id created_at approved_at approved_int completed_at completed_int ... location_id partner_id
1 2013-07-31 04:00:00 2013-08-01
11:00:00
34979 2013-08-04
13:00:00
35015 ... 1 1
2 2013-03-07
07:00:00
2013-03-07
10:00:00
32458 2013-03-12
14:00:00
32534 ... 4 1
... ... ... ... ... ... ... ... ...
n 2013-05-07
09:00:00
2013-05-08
02:00:00
33506 2013-05-16
12:00:00
33660 ... 3 2

With a velocity table that had corresponding hour sequence numbers for each approved and completed time, we could simply difference these two values to see the time spent in production, in hours. This could be easily extended to capture any timeframe.

```      SELECT

approved_at

,

completed_at

,

completed_int

-

approved_int

AS

duration_in_hours

FROM

velocity

```

— fin —