Data Warehouse Definition
A Data Warehouse is a centralised system designed to store large volumes of structured data from multiple sources in a format optimised for analysis and reporting, not for day-to-day operations. It brings together historical data from across an organisation so that analysts and decision-makers can query it, identify trends, and generate reports without affecting the performance of the systems that run the business.
How is it different from a regular database?
An operational database (like the one behind an e-commerce checkout or an ERP system) is built for speed in reading and writing individual records: processing orders, updating stock levels, recording transactions. A data warehouse is built for a different job: running complex queries across millions of records to answer questions like "what were total sales by region over the last three years?"
The structural difference reflects this. Operational databases are typically normalised: data is split across many related tables to reduce duplication. Data warehouses often use a flatter, denormalised structure (sometimes called a star or snowflake schema) that makes analytical queries faster and simpler to write.
What data goes into a data warehouse?
Data is typically loaded from operational systems — sales platforms, ERPs, CRM tools, logistics software, using ETL or ELT processes. Once inside, it is stored historically: records are not overwritten when something changes; instead, new versions are added alongside old ones. This allows analysis of how things changed over time.
How does it relate to a Data Lake?
A Data Lake stores raw, unprocessed data in its original format: structured, semi-structured, and unstructured alike (logs, images, documents). A data warehouse stores data that has already been cleaned, structured, and modelled for analysis. The two are often used together: raw data lands in the lake, is processed, and the refined output is loaded into the warehouse for reporting. Some modern platforms blur this distinction under the term data lakehouse.