IEEE ICHI Data Analytics Challenge on Missing data Imputation (DACMI)

Shared task Organizers

Yuan Luo, PhD (Assistant Professor, Northwestern University)

Announcing

Shared task for the 2019 ICHI Data Analytics Challenge on Missing data Imputation (DACMI) for longitudinal ICU laboratory test data. Fast facts plus details available at https://www.ieee-ichi.org/challenge.html. Selected papers will be published in the the Journal of Healthcare Informatics Research, after ensuring due peer review process. We welcome participants from different disciplines and please also forward and help us spread the message!

Fast Facts

The problem: A key challenge in clinical data mining is that most clinical datasets contain missing data. Since many commonly-used machine learning algorithms require complete datasets (no missing data), clinical analytic approaches often entail an imputation procedure to "fill-in" missing data. However, although most clinical datasets contain a temporal component, most commonly used imputation methods do not directly accommodate longitudinal, time-series-based clinical data. We have developed a shared-task challenge dataset that be used to benchmark accuracy of imputation algorithms on missing clinical time series data.
Task: The challenge centers on a single task - to impute missing data in a clinical dataset of longitudinal multivariable laboratory test results. The dataset consists of test results for 13 commonly measured analytes (clinical laboratory tests). To evaluate imputation method performance, we randomly masked selected test results at varying time points. Participants are asked to impute these masked results alongside results missing from the original data. The benchmarking performance metric is evaluated by comparing predicted results to measured results for masked data points.
Dataset: Our datasets is derived from MIMIC dataset https://mimic.physionet.org/ which is a large real world ICU database. We have deposited the derived dataset on MIMIC site at https://bit.ly/2H98PVD. But to officially gain access, you need to sign their data user agreement by following the two steps.

Complete CITI training: Please refer to your own institution's CITI site.
Complete MIMIC Data User Agreement (DUA):

Follow instructions: https://mimic.physionet.org/gettingstarted/access/ For test dataset, we will provide a secure download link, and will deposit it to MIMIC after the challenge completion.

Publication: Selected papers will be published in the the Journal of Healthcare Informatics Research, after ensuring due peer review process. Page limit 7 page with JHIR typesetting for each paper. For papers over the limit, fee for each additional page may be charged.

To receive updates as more information becomes available, please register for the DACMI challenge organizers Google Group by completing the participant signup form (https://goo.gl/forms/vSwoUlQL8d4ZK3f82).

Timeline

Last date to submit participant signup form https://goo.gl/forms/vSwoUlQL8d4ZK3f82: April 30

Test dataset downloadable. Output due 4 days from your download timestamp: May 1-2

Short shared task papers due (no late submissions accepted): May 10

Short shared task paper reviews/feedback provided to authors: May 20

JHIR paper submission due for selected teams: June 30

Format & principles

The following is a preview of the Rules of Conduct that you will be required to sign. The format of the 2019 DACMI shared task and the principles which bind the participants of this shared task are as follows:

In order to support the shared task, DACMI will provide the participants with data generated from MIMIC.
All members of all teams are required to sign MIMIC's data use agreement online.
DACMI will ﬁrst release annotated training data. Teams can use this data to develop their systems. The systems are expected to be fully automatic, i.e., no human intervention in their output.
Evaluation is to be run on held-out test data. Teams are not allowed to train on test data. All development and training must stop before teams download the test data. Teams are expected to generate fully automatic system outputs on the test data. Manual interventions with the test data predictions are not acceptable.
Gaining access to any portion of the 2019 DACMI test data commits the teams to participate in the evaluation that will be run by DACMI. Teams cannot withdraw from the evaluation after gaining access to the data.
Gaining access to any portion of the 2019 DACMI test data commits the teams to submit a paper describing their developed system to DACMI.
Gaining access to any portion of the 2019 DACMI test data commits the teams to not use any other MIMIC or MIMIC-derived data to assist generating their answers. This will be enforced together with item 9.
Gaining access to any portion of the 2019 DACMI test data commits the teams to open source their code.
Gaining access to any portion of the 2019 DACMI test data commits the teams to present their work in the follow-up workshop to be organized by DACMI (should their submitted paper be accepted for presentation) at IEEE ICHI 2019.
Gaining access to any portion of the 2019 DACMI test data commits the teams to not share the data with non-team members.

Evaluation results are in https://doi.org/10.1093/bib/bbab489.