Architecture

Vision

The vision of SCANNER is to provide a scalable and flexible distributed infrastructure for collaborative comparative effectiveness research. SCANNER can be used for conducting multiple studies within the same computer network, where each study might focus on different clinical domains, have varying data needs and data models, and might have different data sharing policies, and with different combinations of sites participating in each study. The network must scale to accommodate the volume and heterogeneity of studies, data, users, and policies.

 

Conceptual Architecture


revised_architecture_v2

 


The figure above shows the conceptual architecture of SCANNER. The network consists of four main service components: the Portal, the Registry, the Master node, and one or more Worker nodes.  A user first authenticates his or her identity to the Portal, which upon success presents a web-based graphical user interface to the user.  The Portal interface presents a set of selection controls (based on information stored in the Registry) about what datasets, computational algorithms, and remote nodes are available for the user to execute a query against.  The Portal sends the query requests to the Master nodes, which simultaneously issues the query to the piece of the computational plug-in residing on each of the remote Worker nodes.  The Worker node exposes resources provided by each site to the network.  Each participating institution hosts its own SCANNER Worker node (virtual machine provided to each institution), which includes resources such as databases and data sets, as well as, computational services.  The types of computational services available are data analysis services.  For analysis, SCANNER researchers are developing a statistical analysis toolkit, called Observational Cohort Event Analysis and Notification System (OCEANS).  In addition, SCANNER researchers are developing a tool that builds a global predictive logistic regression model without sharing data, called Grid binary LOgistic Regression (GLORE).  Click here to watch a video on analyses currently implemented in OCEANS.

 

While the architecture described above does not require a specific data model, many of the resources and tools implemented in SCANNER use the OMOP version 4.0 common data model. SCANNER local nodes contain a data transformation service that can convert the OMOP format data to formats needed by analytical services.

 

Comparative effectiveness researchers access the resources at study sites through a SCANNER portal. The portal contains applications that can request resources from various sites participating in a study, and integrate the resources to present to the researcher. For example, an application allows the researcher to compose a query for de-identified data from each site, integrates the results into one data set, and displays the results.

 

 

Data Governance

The access control to data in SCANNER is delegated to the site that is providing the data. Sites implement their data access policies using the policy enforcement modules. Access policies are specified as credential expressions that themselves specify the roles of the users accessing a resource, and how the users are authenticated. For example, a credential expression could state that access to a data set is limited to study investigators as specified in an Institutional Review Board (IRB) approvals database, and the user must belong to one of the study sites and be authenticated by that site’s enterprise authentication services. Beyond access control, sites can also implement other trust services for each resource, e.g., records retention.

 

 

An example: Medication Surveillance

The Medication Surveillance CER in SCANNER provides an unusual mechanism for a collaborative study. The three participating sites will host the following resources on their local nodes (a) data sets related to anticoagulant and antiplatelet therapy, and (b) statistical process control data analysis services. The policy enforcement layer enforces the policy that the raw data are not shared. Instead, the node allows execution of the data analysis services on the specified data sets. Thus, a CER expert would send a request to the Medication Surveillance application (from the SCANNER portal) to execute the same analysis at all three sites on their respective data sets. When all the results are received, the Medication Surveillance application integrates the results and notifies the user of their availability.