One can configure the load types for each table on DataSelf ETL+ main page.
Load All is the default setting and is useful when the data volume is small, or there are no ways to perform delta loads.
Load All ETL+ Process
Loads the data to a temporary data warehouse table named TableName_Temp.
When the temporary table load completes:
If there are no table transformations: ETL+ deletes the data warehouse TableName, and renames TableName_Temp → TableName.
If there are table transformations: ETL+ deletes the data warehouse TableName, and does a bulk insert from TableName_Temp into TableName applying transformations.
Replace is a delta loading option that dramatically speeds up the load after the initial one.
Replace adds new rows and replaces recently added ones to a table based on a filter criteria. For example, all data is loaded in the initial InvoiceHeader table extraction, and after it only adds/replaces rows where the control column values >= last extracted InvoiceDate.
When to Use
When there’s a single column identifying the order of the table’s row creation. Ex.: Document Date, or an always ascending Document Number.
To re-load all records for a recent period that might have records modified. Examples of expressions for reloading recent periods:
Last 7 days: DATEADD(DAY, - 7, GETDATE())
One calendar month ago: DATEADD(MONTH, - 1, GETDATE())
Last day of of the prior calendar month: EOMONTH(GETDATE(), - 1)
This calendar month: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 1))
Last two calendar months: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 2))
This calendar year: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0)
Last two calendar years: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 1, 0)
Do NOT use columns that can have their content updated such as LastModifiedDateTime columns. For tables with such columns, use Upsert below instead.
Select a column in the
Fielddropdown to function as the Replace filter. This column should identify the order of the table’s row creation.
Filter Valueif needed. By default, the filter value will be
Max(ColunmName). You may also customize the
Filter Valuewith values such as:
Constants. Ex.: ‘1/1/2022’
MS SQL expressions. Ex.: getdate()-30 to Replace all records for the past 30 days.
- Availability of a column to control the delta load subset. Ex.: a date or a sequentially growing value.
- The source table allows filtered data extraction.
- After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require a Load All load to sync up.
- Rows prior to the max control column value don’t change. Use Forced Load All (below) to capture modified or deleted records on a less frequent schedule.
Replace ETL+ Process
If applicable, reads the Replace filter from the data warehouse table. In the screenshot above, this would be the maximum value of Invoice Date.
Delete all data warehouse rows where Filter Column >= Filter Value.
Load (append) all source table rows where Filter Column >= Filter Value onto the data warehouse table.
Upsert is a delta loading option that dramatically speeds up the load after the initial one.
Upsert re-loads modified rows and inserts new ones from the source table. For example, all data is loaded in the initial SalesOrderHeader table extraction, and after it only updates rows that have changed since the last load and inserts new ones.
When to Use
When there are existing records that can be updated overtime and there is a primary key and a last-modified column flag. Many modern source systems provide these features and this is the most efficient delta refresh process.
Column with Last Modified Value: This is usually a date/time field, but might also use other data types. It’s not unusual for users to get this configuration incorrect.
- There’s a column flagging modified rows (ex.: Last Modified Date and Time).
- The table has a primary key.
- The source table allows filtered and indexed data extraction.
- After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require one Load All load to sync up.
- Already loaded records are not deleted. Use Forced Load All (below) to capture deleted records on a less frequent schedule.
Upsert ETL+ Process
Read the max value of the Last Modified Value from the data warehouse table.
Load all source table rows where Last Modified Value column >= max value into a temporary table.
Delete all rows from the data warehouse table where temporary table primary key = data warehouse table primary key.
Load (append) temporary table into the data warehouse table.
Use this setting when you want to append the source onto the data warehouse table.
Append ETL+ Process
Append source table rows into the data warehouse table.
Refresh Batch → Forced Load All
Forced Load All on a Refresh Batch guarantees that its tables will have their data completely re-loaded, ignoring preconfigured delta refreshes. See below for an example where running Batch #2 will re-load its two tables.
One can use this feature manually to guarantee the reload of all data at will, or set up scheduled refreshes in conjunction with delta refreshes. For instance, run Replace and/or Upsert on a frequent schedule (ex.: hourly), and Forced Load All in a less frequent schedule (ex.: nightly).
For instance, run Replace and/or Upsert loading every hour, and Forced Replace All once every night. With this setting, already extracted rows that are modified (Replace misses modified records) or deleted on a day (Replace and Upsert miss deleted records) will be captured overnight.
How to configure this:
- Set Replace and/or Upsert for the desired table(s) - see ETL+ Main Page.
- Set a Refresh Batch to run the delta load on a frequent schedule.
- Create a new Refresh Batch to run Forced Load All: select the table(s) that use Replace and/or Upsert on the Refresh Batch Steps panel, set this batch to “Forced Load All” (see pic above), schedule this batch to run on a less frequent schedule.