Data warehousing in healthcare


When we talk about Data warehousing, a terminology by Ralph Kimball, defines data warehouse as “a copy of transaction data specifically structured for query and analysis”. Data warehouse of an enterprise consolidates data from heterogeneous sources to support enterprise wide decision-making, reporting, and analysing.

Narrowing down the plethora of discussion to healthcare data warehousing in particular, this presents its own set of unique challenges. The industry is rife with medical standards and coding schemes, many of which are incompatible and require careful translations. Healthcare data comes from many sources and is delivered in many forms, including published books and individual spreadsheets. Results derived from a healthcare data warehouse must be delivered in accessible form to diverse stake-holders, including healthcare regulators, physicians, hospital administrators and consumers. The industry’s widely decentralized and largely autonomous data collection efforts make data quality a significant challenge. Finally, the sensitivity of healthcare data makes privacy and security issues paramount.


The healthcare data warehouse performs several important activities, including support of decision-making and creation of an infrastructure for exploration of very large data collections. Decision makers should be able to pursue many of their investigations using browsing tools, without relying on database programmers to construct queries.

The healthcare data warehouse design includes three levels of data granularity, from coarse-grained data used in generic report production to detailed event-level data, such as hospital discharges. The highly aggregated data produce the core reports. These reports provide fast interactive response times for access via data browsing tools and can provide the foundation for simple, community-wide access. The middle level  aggregated data is supported by families of star schemas that provide true dimensional data warehouse capabilities, such as interactive roll-up and drill-down operations where facts are grouped at different levels for analysis. These schemas have carefully designed dimensions that more sophisticated data browsing tools can use to support online analytical processing (OLAP) techniques. The very fine grained or even event-level data includes hospital discharge data. This data is retained at the individual transaction level because of the rich set of facts and dimensions available for analysis and reporting. 


Using a data warehouse enhances report making in many ways, including the addition of community-level data. Client agencies can now define communities based on smaller geographic areas, such as zip codes, and they can use the finer-grained information to gain

a better understanding of critical healthcare issues. This information helps in deploying resources and designing interventions to meet specific community health-status challenges.

In addition to generating comprehensive reports, the data warehouse provides new avenues for healthcare research. Integrating each new information source into the data warehouse provides a rich resource that gives both researchers and community stakeholders a thorough overview of the data and lets them quickly investigate the details through OLAP technologies.


A data warehouse provides a digital representation of the “real world” as it exists at discrete points in time. The fitness for use or quality of the data warehouse, then, is a direct function of the fidelity of these observed behaviour patterns to the actual behaviours in the real world. The mission of healthcare decision support is to provide the information on which sound decisions are made. Poor data quality can lead to poor decisions, threatening this mission. 

In general, errors that threaten data warehouse quality have five points of entry:

  • Design or conformation errors—such as incompatible units, mixed data granularity, precision, scope, depth, coherency, and other factors—constitute fundamental flaws. Although most design errors are identified during initial loading and testing, the highly evolutionary nature of many data warehouse implementations makes these projects particularly vulnerable to design errors because structural changes are frequently made with less rigorous review and testing.
  • Collection errors cause particular trouble because data warehouses often rely on other sources to initially capture and store data. Incorrectly recorded values, mixed records, dropped data points or fields, incorrect units, inconsistent standards, or untimely collection practices can contaminate the data with scant evidence of error.
  • Staging errors can be introduced into clean source data through mishandling of the loading process. Invalid or incomplete extraction queries, improper transformation processes, and truncated loading operations can be error sources.
  • Data integration, a fundamental data warehouse characteristic, can be the source of errors when loading procedures make key linkages between facts and dimensions. Data warehouses differ sharply from conventional databases in that they do not record reality, they assemble it.
  • Query errors are introduced through improper query formation or misplaced expectations. The query interface is where users form perceptions regarding data quality, even though errors can be introduced in all previous steps. The query interface influences perceptions of usefulness, believability, and constituent factors. 

Thus. Healthcare data warehousing will make important contributions to the health status of individuals and communities by making rigorous, quantitative information available to healthcare decision makers making itself as one of the most in-demand technologies of the world.

Leave a Reply

Your email address will not be published. Required fields are marked *

Start a project

If you want to get a free consultation without any obligations, fill in the form below and we'll get in touch with you.