Please have a look at the first part, which covers date, time and time zone basics in detail.
DjaoDjin caters to micro-SaaS products. These are specialized and local products. As an example, A1Ceus targets compliance with New York State professional certification requirements. Usually the teams behind those websites are small (1-3 people) with a pulse on the daily business numbers. Reporting graphs in UTC did not cut it. Questions kept pilling up in the customer support inbox about discrepancies between what those micro-SaaS entrepreneurs were experiencing and what the reports charts were saying. Reports had to be presented midnight to midnight local time. Here is the journey of what that meant technically to show revenue reports in local time.
A brief overview of the stack
Our core tech is
djaodjin-saas, it is a pluggable Django app. It adds a multitude of API endpoints, among which there are a bunch of endpoints that return transaction data. The data is stored in PostgreSQL and API endpoints are powered by Django Rest Framework. On the client side, this transaction data is used to display graphs. Client side is built on top of Angular.js.
On page load Angular.js makes a GET request like this:
ends_at parameter is expected to be an ISO 8601 formatted timestamp in UTC. All DjaoDjin APIs are using ISO 8601. Stripe decided to use UNIX timestamp in their APIs instead, but we found it complicated to debug date/time related issues.
This request fetches transactions created during a 12 month period, counting backwards from today’s date, which is 2018-04-18, till 2017-04-01. In this case it is not a full 12 month period, because the current month hasn’t ended yet. However, a user then can pick from previous months to have a complete 12 month period, for example a period from 2018-03-01 till 2017-03-01.
However, this URL format is deprecated, it is now required to pass an additional
timezone parameter besides the
ends_at parameter. The reason behind this will become clear by the end of this post. Meanwhile, here’s how we get the time zone of the client:
Initially, we construct a
Date object (which is then passed as
ends_at parameter). This object has only a UTC offset of the browser. However, a browser doesn’t know the exact time zone name. In modern browsers there are APIs which standardize the process of getting the time zone name. We use Moment.js with Moment Timezone, which leverages these new APIs to guess the client’s time zone name:
moment.tz.guess() // "Europe/Kiev"
We then pass this string as a
timezone parameter and a constructed
Date object as an
ends_at parameter when making an API call. When Angular.js makes an actual AJAX request the
toISOString method of
Date object is called, which produces an ISO 8601 timestamp.
Django Rest Framework side
When DRF receives a request, we parse the timestamp with
django.utils.dateparse.parse_datetime function, the return value of which is an aware
datetime object. Because ISO 8601 permits only UTC or UTC offset to be specified, the
tzinfo attribute of this object is either a UTC (
pytz.UTC) or a UTC offset (
Once we have an end date, we call a
saas.managers.metrics.month_periods to generate a list of 12
datetime objects, that will be used to query the transactions later. Here is the original source of the function which handles UTC-only dates:
def month_periods(nb_months=12, from_date=None, step_months=1): dates =  from_date = datetime_or_now(from_date) dates.append(from_date) last = datetime( day=from_date.day, month=from_date.month, year=from_date.year, tzinfo=utc) if last.day != 1: last = datetime(day=1, month=last.month, year=last.year, tzinfo=utc) dates.append(last) nb_months = nb_months - 1 for _ in range(0, nb_months, step_months): year = last.year month = last.month - step_months if month < 1: # integer division year = last.year + month // 12 assert isinstance(year, six.integer_types) if month % 12 == 0: year -= 1 month = 12 else: month = month % 12 last = datetime(day=1, month=month, year=year, tzinfo=utc) dates.append(last) dates.reverse() return dates
The problem with the current implementation is that when constructing new
datetime objects, the time zone is set to UTC:
last = datetime(day=1, month=month, year=year, tzinfo=utc)
This is a bug, because if a HTTP request is made with a timestamp in UTC offset, the first
datetime object will have a UTC offset time zone, while the rest of the
datetime objects will still be in UTC. So, we have a potential 24 hour window for transactions with the wrong date to creep in. A solution to this problem is to use the time zone of the first
datetime object when constructing the rest of the
datetime objects. In our situation, it doesn’t make sense to convert the first object to UTC too, because when we construct
datetime objects in original time zone, we don’t care about time, when constructing subsequent
datetime objects, that’s because each period begins at 00:00. However, if we do convert to UTC, the time is no longer 00:00 and the day is potentially also different — this complicates the construction of periods. So, it is much easier to stick with whatever the time zone offset was passed with the request.
orig_tz = from_date.tzinfo # later in loop last = datetime(day=1, month=month, year=year, tzinfo=orig_tz)
The other problem is that the user who made the request might be located in a time zone which has DST. If the time zone has DST, its UTC offset will not be constant during the year. When generating
datetime objects we use the same UTC offset for each of them, as a consequence for dates that fall onto DST period the actual local time will be off by one hour. To fix this we need the time zone name, which we’ll use to get the DST rules for this specific time zone.
There is no way to get the correct offset for particular date with DST without a time zone name and we can’t get time zone name based on offset, due to its changing nature. That means we need to pass the time zone from the client somehow. Unfortunately, ISO 8601 doesn’t specify a way to include a time zone name with the timestamp, only an offset. As a solution, we’ll pass a separate
timezone parameter from the client. In case where a client doesn’t pass a time zone string or passes a wrong string, we’ll have to fallback to using UTC offsets, ignoring DST. We might actually require the
timezone parameter in future to prevent potential date and time errors. Let’s modify the
month_periods function to accept a time zone string from the view:
def month_periods(nb_months=12, from_date=None, step_months=1, tz=None): ...
OK, now that we have a time zone string, we can solve the last issue with this function. When constructing
datetime objects, we need to modify the offset based on the DST, so that the local time is always equal to 00:00. To do this, we will use a
pytz package, which is an implementation of IANA database in Python. Let’s create a helper function which parses a time zone string into a
from pytz import timezone, UnknownTimeZoneError def parse_tz(tz): if tz: try: return timezone(tz) except UnknownTimeZoneError: pass
month_periods function, we need to parse the time zone string first. If what we’ve got is an actual time zone, convert the
tzinfo object to the new time zone object, otherwise just leave it with the original UTC offset object. Here’s the code that converts the first period’s
tzinfo object to the newly parsed time zone object:
tz_ob = parse_tz(tz) if tz_ob: from_date = from_date.astimezone(tz_ob)
When dealing with the rest of the periods, we can construct a naive
datetime (an object without a
tzinfo attribute) first. Then if we have the time zone object, call a
tz_ob.localize method which adds a time zone to the date (making it aware) and applies a correct UTC offset based on DST rules. If we don’t have a time zone object, we’ll fallback to using UTC offset (this ignores DST). Let’s wrap this into a helper function too:
def _handle_tz(dt, tz_ob, orig_tz): if tz_ob: # adding timezone info # this also accounts for DST loc = tz_ob.localize(dt) else: # adding UTC offset only loc = last.replace(tzinfo=orig_tz) return loc
month_periods function will look like this:
def month_periods(nb_months=12, from_date=None, step_months=1, tz=None): dates =  from_date = datetime_or_now(from_date) orig_tz = from_date.tzinfo tz_ob = parse_tz(tz) if tz_ob: from_date = from_date.astimezone(tz_ob) dates.append(from_date) last = datetime(day=from_date.day, month=from_date.month, year=from_date.year) last = _handle_tz(last, tz_ob, orig_tz) if last.day != 1: last = datetime(day=1, month=last.month, year=last.year) last = _handle_tz(last, tz_ob, orig_tz) dates.append(last) nb_months = nb_months - 1 for _ in range(0, nb_months, step_months): year = last.year month = last.month - step_months if month < 1: # integer division year = last.year + month // 12 assert isinstance(year, six.integer_types) if month % 12 == 0: year -= 1 month = 12 else: month = month % 12 last = datetime(day=1, month=month, year=year) last = _handle_tz(last, tz_ob, orig_tz) dates.append(last) dates.reverse()
So, when we call this function we’ll have the following result:
>>> from django.utils.dateparse import parse_datetime >>> from pprint import pprint # pretty print >>> from_date = parse_datetime('2018-04-01T00:00:00+03:00') >>> tz = 'Europe/Kiev' >>> dates = month_periods(from_date=from_date, tz=tz) >>> pprint(dates) [datetime.datetime(2017, 4, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 5, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 6, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 7, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 8, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 9, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 10, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>), datetime.datetime(2017, 11, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EET+2:00:00 STD>), datetime.datetime(2017, 12, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EET+2:00:00 STD>), datetime.datetime(2018, 1, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EET+2:00:00 STD>), datetime.datetime(2018, 2, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EET+2:00:00 STD>), datetime.datetime(2018, 3, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EET+2:00:00 STD>), datetime.datetime(2018, 4, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Kiev' EEST+3:00:00 DST>)]
Awesome. Have you noticed how the time zone offset changes during the year, while time stays the same? That’s what we were trying to achieve with
Now that we have the list of correct periods, it is then used to make database queries. Once we have the transactions in our views, they are passed to DRF serializers, which encode them to produce a JSON response. DRF serializers don’t modify
datetime objects and their time zone info, so we don’t need to worry about that.
Let’s have a look at how Django ORM and PostgreSQL handles time zones in detail.
Django ORM time zone handling
Internally, Django uses
pytz package. Django time zone behavior depends on the
USE_TZ setting. If your application is serious about time you should always have time zone support enabled and work with aware
datetime objects only, otherwise the errors will inevitably creep in.
Django ORM will not modify aware
datetime objects when constructing an SQL query, so it is pretty safe to pass them as arguments to the queries. When an ORM compiles a query, it basically calls the
__str__() method for each of the
datetime fields, which produces an ISO 8601 timestamp. This applies to both the queries produced by ORM and to raw SQL queries crafted by hand. So, the final SQL query that will be sent to PostgreSQL will have ISO 8601 timestamps with whatever UTC offset was in the
tzinfo attribute, here’s an example:
SELECT COUNT(DISTINCT(prev.dest_organization_id)), SUM(prev.dest_amount) FROM saas_transaction prev LEFT OUTER JOIN ( SELECT distinct(dest_organization_id) FROM saas_transaction WHERE created_at >= '2017-05-01 00:00:00+03:00' AND created_at < '2017-06-01 00:00:00+03:00' AND orig_organization_id = '2' AND orig_account = 'Receivable' ) curr ON prev.dest_organization_id = curr.dest_organization_id WHERE prev.created_at >= '2017-04-01 00:00:00+03:00' AND prev.created_at < '2017-05-01 00:00:00+03:00' AND prev.orig_organization_id = '2' AND prev.orig_account = 'Receivable' AND curr.dest_organization_id IS NULL
When Django establishes a connection with PostgreSQL it specifies a
timezone parameter, it is used by PostgreSQL to determine which time zone the timestamps should be returned in. If
USE_TZ is set to
True, the value of the parameter will be set to UTC by default. This means that ORM will receive the response from PostgreSQL with timestamps in UTC and that the response will be used to build a model with
datetime fields in UTC. To construct model datetime fields a
parse_datetime method is used, which parses a UTC timestamp from the database and constructs an aware
datetime object in UTC. So, we don’t have the same aware
datetime object that we stored or used to query initially, because
datetime data is stored and retrieved in UTC and neither ORM nor PostgreSQL have any knowledge of time zones. Actually, Django converts those objects to a local time zone when they are used in Django Forms or templates, but this is not relevant to our situation.
At this point, we have a list of transaction rows with
datetime fields in UTC. We can either convert the dates back to the original time zone first or pass them back to DRF.
Time zones in PostgreSQL
In PostgreSQL all date and time data is stored in UTC internally.
On input PostgreSQL accepts an ISO 8601 timestamp with a UTC offset or even with a time zone name, even though it is not a standard ISO 8601 string. In our case, Django and Python will produce a timestamp with UTC or UTC offset. When no time zone info is specified in a timestamp, the time zone will be taken from the
timezone parameter (in our case it is configured by Django upon connection and set to UTC by default). Every timestamp with time zone or UTC offset used in SQL queries is converted to UTC by PostgreSQL internally.
Before outputting the data, PostgreSQL converts the timestamps from UTC to the time zone set by the
timezone parameter (in our case it is set to UTC, so no conversion is required). Here’s an example:
djaodjin=> SHOW TimeZone; TimeZone ---------- UTC (1 row) djaodjin=> INSERT INTO transactions (created_at) VALUES (TIMESTAMP WITH TIME ZONE '2018-04-19 03:00:00 Europe/Kiev'); INSERT 0 1 djaodjin=> SELECT created_at FROM transactions; created_at ------------------------ 2018-04-19 00:00:00+00 (1 row) djaodjin=> SET TimeZone TO 'Europe/Kiev'; SET djaodjin=> SELECT created_at FROM transactions; created_at ------------------------ 2018-04-19 03:00:00+03 (1 row)
In this post we showed how to present monthly SaaS reports midnight-to-midnight, in local time and through DST. Finally, we described in detail how each piece of software in the stack handles the time zones during this process. Clone djaodjin-saas on GitHub, it is free! If you need help with your SaaS product, I am available for hire.
Don't miss the new posts.
I could send you updates on the new stuff.