Movement maps using telco data
These data show the number of daily trips between Danish Municipalities, estimated from aggregated mobile phone data provided by four large telecommunication companies. It is not possible to infer individual mobility from this dataset. Data covers the period from 1. February of 2020.
Origin of data
The data set was delivered by four major Danish telcos to Statistics Denmark (DST) to help the Danish State Serum Institute (SSI) - the Danish equivalent to the US’s CDC - to model the spread of COVID-19 and understand population behavior in response to various lock-down/mitigation measures. The dataset was officially requested by SSI and the legality of its use was ensured by the Ministry of Industry, Business, and Financial Affairs. To release the data from DST, we have removed a small amount of data to ensure the origin of the data (which operator) is confidential, see below. We are currently working on releasing the full data set in downloadable form.
The data includes the daily number of trips between pairs of municipalities. Each entry consists of:
- Unix timestamp
- Origin: the KOMKODE of the Danish municipality where trips start
- Destination: the KOMKODE of the Danish municipality where trips end
- Counts: estimated number of trips between origin and destination
Statistical processing
The number of trips are estimated combining aggregated location data provided by four major Danish telecommunication companies. First, we pre-processed the data to make the four datasets comparable. Secondly, we combined the four datasets together. Finally, we filtered data to remove potentially sensitive information.
Original data obtained from mobile-phone providers:Company 1 (c1): Origin-destination trips aggregated by zipcode and by day. A user was considered static when located in the same cell-tower for more than 15 minutes.
Company 2 (c2): Transitions aggregated by zipcode every 6 hours. (Transitions in the same zipcode are excluded)
Company 3 (c3): Origin-destination trips aggregated by municipality and by day.
Company 4 (c4): Origin-destination trips aggregated by zipcode and by day.
Pre-processing:Coarse-grain space: for c1, c2, and c4, we summed the number of trips originating from zipcodes in the same municipality
and directed to zipcodes in the same municipality.
Coarse-grain time: For c2, we summed the number of trips between a given origin and destination occurring in the same day.
Adjust shares: We renormalized the number of trips to make sure that the fraction of trips measured by each company is
approximately equivalent to the company’s share of customers. We proceeded as follows:
- We computed the total number of trips \( T_c \) in March by each company c.
- We took company c1 as a reference and computed the ratio \( \frac{T_c}{T_{c1}}. \)
- We computed the customer ratio \( \frac{N_c}{N_{c1}} \) , where \( N_c \) is the number of customer of company c, obtained from publicly available data.
- For each company, we computed the constant \( k_c \) , such that \( \frac{N_c}{N_{c1}}=k_c\frac{T_c}{T_{c1}}. \)
- We multiplied by \( k_c \) the number of trips measured by company c.
The total number of trips
\( AB_c(t) \)
between any pair of municipalities
\( A \)
and
\( B \)
on day
\( t \)
is estimated as the sum of the trips computed by
each company:
$$AB(t) = \sum_c AB_c(t),$$
where
\( AB_c(t) \)
is the number of trips between
\( A \)
and
\( B \)
on day
\( t \)
using data from company
\( c \)
.
In days where data from the four companies is available , we compute
\( AB(t) \)
using the quation above. For some days
\( t* \)
, data from a certain company
\( c* \)
may be missing. In this case,
we estimate
\( AB(t*) \)
as follows:
We compute the average share of trips between
\( A \)
and
\( B \)
over time for
company
\( c* \)
.
$$\overline{AB_{c*}}=\sum_t\frac{AB_{c*}(t)}{AB(t)}.$$
We compute the partial number of trips between
\( A \)
and
\( B \)
on day
\( t* \)
using data from the available companies:
$$AB_{partial}(t*) = \sum_{i\neq c*} AB_{c*}(t).$$
We estimate the number of trips between
\( A \)
and
\( B \)
on day
\( t* \)
by company
\( c* \)
as:
$$AB_{c*}(t*) \sim \frac{AB_{partial}(t*)}{1-AB_{c*}}.$$
We finally compute
\( AB(t*) \)
as:
$$AB(t*) \sim AB_{partial}(t*)+AB_{c*}(t*).$$
We apply two filters:
- We remove entries such that the number of trips between \( A \) and \( B \) at time \( t \) is larger than 5. Thus, we impose \( AB(t)>5 \) .
- We remove all entries with origin \( A \) at time \( t \) if any given company accounts for more than \( 80\% \) of the total number of outgoing trips from \( A \) . Thus we impose: $$\frac{\sum_BAB_c(t)}{\sum_BAB(t)}<0.8,$$ for all companies \( c \) .