I’ve tried to explain the difference between OLTP systems and a Data Warehouse to my managers many times, as I’ve worked at a hospital as a Data Warehouse Manager/data analyst for many years. Why was the list that came from the operational applications different than the one that came from the Data Warehouse? Why couldn’t I just get a list of patients that were laying in the hospital right now from the Data Warehouse? So I explained, and explained again, and explained to another manager and another. You get the picture.
In this article, I will explain this very same thing to you. So you know how to explain this to your manager. Or, if you are a manager, you might understand what your data analyst can and cannot give you.
OLTP
OLTP stands for On Line Transactional Processing. In other words: getting your data directly from the operational systems to make reports. An operational system is a system that is used for the day to day processes.
For example: When a patient checks in, his or her information gets entered into a Patient Information System. The doctor put scheduled tests, a diagnosis, and a treatment plan in there as well. Doctors, nurses, and other people working with patients use this system daily to enter and get detailed information on their patients.
The data is stored within operational systems so the data can be used efficiently by the people working directly on the product or with the patient in this case.
Data Warehousing
A Data Warehouse is a big database that fills itself with data from operational systems. It is used solely for reporting and analytical purposes. No one uses this data for day-to-day operations. The beauty of a Data Warehouse is, among others, that you can combine the data from the different operational systems. You can actually combine the number of patients in a department with the number of nurses for example. You can see how far a doctor is behind schedule and find the cause of that by looking at the patients. Does he run late with elderly patients? Is there a particular diagnosis that take more time? Or does he just oversleep a lot? You can use this information to look at the past, see trends, so you can plan for the future.
The difference between OLTP and Data Warehousing
This is how a Data Warehouse works:
The data gets entered into the operational systems. Then the ETL processes Extract this data from these systems, Transform the data so it will fit neatly into the Data Warehouse, and then Loads it into the Data Warehouse. After that, reports are formed with a reporting tool, from the data that lies in the Data Warehouse.
This is how OLTP works:
Reports are directly made from the data inside the database of the operational systems. Some operational systems come with their own reporting tool, but you can always use a standalone reporting tool to make reports from the operational databases.
Pros and Cons
Data Warehousing
Pros:
- There is no strain on the operational systems during business hours
- You won’t disturb the operational processes as you can schedule the ETL processes to run during the hours the least amount of people are using the operational system. And when you need to run a large query, the operational systems won’t be affected, as you are working directly on the Data Warehouse database.
- Data from different systems can be combined
- It is possible to combine finance and productivity data for example. As the ETL process transforms the data so it can be combined.
- Data is optimized for making queries and reports
- You use different data in reports than you use on a day-to-day base. A Data Warehouse is built for this. For instance: most Data Warehouses have a separate date table where the weekday, day, month, and year are saved. You can make a query to derive the weekday from a date, but that takes processing time. By using a separate table like this you’ll save time and decrease the strain on the database.
- Data is saved longer than in the source systems
- The source systems need to delete their old records when they are no longer used in day-to-day operations. So they get deleted to gain performance.
Cons:
- You always look at the past
- A Data Warehouse is updated once a night or just once a week. That means that you never have the latest data. Staying with the hospital example: you never knew how many patients there are right now. Or what surgeon didn’t show up on time this morning.
- You don’t have all the data
- A Data Warehouse is built for discovering trends, showing the big picture. The little details, the ones not used in trends, get discarded during the ETL process.
- Data isn’t the same as the data in the source systems
- Because the data is older than those of the source systems it will always be a little different. But also because of the Transformation step in the ETL process, data will be a little different. It doesn’t mean one or the other is wrong. It’s just a different way of looking at the data. For example the hospital’s Data Warehouse excluded all transactions marked as canceled. If you try to get the same reports from both systems, and don’t exclude the canceled transactions in the source system, you’ll get different results.
online transactional processing (OLTP)
Pro’s
- You get real time data
- If someone is entering a new record now, you’ll see it right away in your report. No delays.
- You’ve got all the details
- You can access all the details the employees have entered into the system. No grouping, no skipping records, just all the raw data that’s available.
Con’s
- You are putting strain on an application during business hours.
- When you are making a large query, you can take processing space that would otherwise be available to the people that need to work with this system for their day-to-day operations. And if you make an error, for instance, forgetting to put a date filter on your query, you could even bring the system down so no one can use it anymore.
- You can’t compare the data with data from other sources.
- Even when the systems are similar. Like an HR system and a payroll system that use each other to work. Data will always be different because it is granulated on a different level, or not all data is relevant for both systems.
- You don’t have access to old data
- To keep the applications at peak performance, old data irrelevant to day-to-day operations is deleted.
- Data is optimized to suit day-to-day operations
- And not for report-making. You’ll have to get creative with your queries to get the needed data.
So what method should you use?
That all depends on what you need at that moment. If you need detailed information about things happening now, you should use OLTP.
If you are looking for trends or insights on a higher level, you should use a Data Warehouse.