The market segments that a BI vendor can go after is governed by its market offering- the BI platforms, services, and applications that it develops. ETL tools are a key component of BI platforms, which have market that is worth currently worth several billion US$. ETL tools come in many colors, shapes and sizes. Using the right ETL tool, a datawarehouse can be created to satisfy any business need. The ETL tool market has not matured yet, compared to say the market for Databases. A number of ETL Tools, which support a variety of ETL architectures, are currently available in the BI market. This diversity of tools provides customers with a number of options in terms of how they can design, run, and manage their BI applications and infrastructure, both from technical and business standpoint.
For an introduction to ETL, pl. see a previous post.
In this post, I will write about the various ETL architectures and discuss their advantages and disadvantages.
The following ETL architectures have evolved in the market so far:
1. ETL- The traditional approach
ETL: "Batch" programs are run at periodic intervals- on a daily, weekly, monthly, etc basis. The programs extract the data in the source system that was entered after the previous ETL batch program ran, in order to capture the incremental data. E- extract happens on source system, T- transform happens in either the target database or on a separate data integration server, and L- Load happens on the Target database.
Examples of technologies that support this mode of ETL are..
2. Oracle Warehouse Builder
4. Golden Gate.
2. ELT- Extract, load first and then transform.
This approach is similar to ETL except that a. Load happens first, and b. There is no dedicated machine to do the Transform, transform happens on the target database.
This technology is supported by Sunopsis / Oracle Data Integrator.
3. ETL happens in real-time.
In traditional ETL, extraction happens by Pull mechanism. In real-time ETL, it happens by either push or a combination of push/pull mechanism.
A common push mechanism is by the use of database triggers, which generate a "trickle-feed" of data stream from the source system as transactions happen in it. The incremental data is pushed through a micro-ETL process that is implemented in simple sql or any other ETL / ELT tool.
The approach that uses the push/pull combination can be thought of as a publish/subscribe model, or messaging oriented data integration mechanism. In this model, when transactions happen in the OLTP system, the OLTP system generates a business event and publishes it (push). A listener grabs the event and does an incremental ETL on the data. Integration could be done in the middleware / database.
4. Virtual ETL
A Virtual Datawarehouse is created on the OLTP system by using "high performance views" . Eg: Qlikview
5. There is no NEED for ETL
Source data volume is low and system memory is big enough that all the data is stored in memory, as opposed to on disk. Any queries that are needed for reporting purpose are blazing fast. Eg: Oracle Times Ten.
6. ETL is fully managed by OLTP Database
OLTP and Datawarehouse are implemented on the same database. OLTP and DW are one and the same. MV technology is used to implement incremental ETL. Eg: DBI.
1. There might be other innovative approaches to ETL in the market, I will write about them as I discover them.
2. While there are pros and cons to each approach, each architecture is best suited to satisfy a specific business need. I will write on this in a future post.
3. Sometimes, a hybrid approach to designing ETL works best for some business scenarios. For example: I have seen some IT vendors take approach 1 and 3 above to develop BI products.
Links of interest
1. Post on ETL vs ELT by Vincent Mc Burney