To compare the ability to detect disease outbreaks of
separate and combined data streams from ambulatory
care and emergency department from Harvard Pilgrim Health Care.
A variety of electronic health event data sources have
been proposed and used for the early detection of
disease outbreaks. While there is some information
available about the utility of these data sources [1,2],
few formal comparisons have been made among
them. Alternatively, these data sources can be combined in order to generate a common sequence of
outbreak signals. However, different sources of data
can be correlated since individuals may seek care in a
variety of settings, including hospital emergency departments and ambulatory clinics resulting in multiple reports for the same individual case in different
data sources. Therefore, combining these data
sources properly is a difficult task since the hypothesis of independent sources is invalid in most cases,
resulting in new or missed signals when compared to
separate data streams analysis.
Using historical HPHC (Harvard Pilgrim Health
Care) episodes data from 2003 to 2006 we mimic a
daily prospective surveillance system. The following
syndromes are evaluated: respiratory (RESP), influenza-like illness (ILI), upper and lower gastrointestinal illness (UGI, LGI). The residential zip code is
used as the spatial component. To ensure an appropriate comparison we use the same detection algorithm, the space-time permutation scan statistic [3,4],
which automatically adjusts for any purely temporal
or purely spatial variation. For each syndrome three
data streams are evaluated, ambulatory care (AC),
emergency (ER), and the combined data streams

We are currently evaluating detected signals with
recurrence intervals (RI) greater than 365 days for
each data stream, which represents the expected time
between seeing an outbreak with an equal or higher
likelihood ratio assuming that the null hypothesis of
no cluster is true. Combined data streams (AC&ER)
provided signals with both increased and reduced RI
when compared to ER or AC alone, as shown in Figure 1. This might associated with the merging procedure which does not consider repeat encounters
within 42 days of a previous one. Unique signals
were also detected for separate and combined data
Dec-04 Mar-05 Jun-05 Oct-05 Jan-06 Apr-06 Aug-06 Nov-06
AC&ER data stream AC data stream ER data stream
Figure 1. Detected signals with Recurrence Interval greater than
365 (reference line) days for AC, ER and AC&ER (combined) data
streams. The x-coordinates represent the day of the signals and ycoordinate the associated Recurrence Interval.
The optimal way to use separate and combined AC
and ER data streams is being investigated. Further
analysis will include the multivariate permutation
scan statistic, available in the SaTScan software. Results indicate that the three data sources appear to
contain independently useful information for disease
outbreak detection.