One can configure the load types for each table on the ETL+ Extract, Transform and Load (ETL) Page .
Based on the table’s ETL SQL Statement, reads all data in the source table or file and loads it to the data warehouse table (the destination). In effect, this is a source table/file data replication/mirroring.
Load All is the default setting and is useful when the data volume is small, or there are no ways to perform delta loads.
Load All Process
Loads the data to a temporary data warehouse table named TableName_Temp. If the temp table already existed before the process started, Load All will delete the temp table first to be sure is a clean new load.
When the temporary table load completes:
If there are no table transformations: ETL+ renames TableName_Temp → TableName.
If there are table transformations: ETL+ deletes the data warehouse TableName, does a bulk insert from TableName_Temp into TableName applying transformations, and deletes the temp table.
Replace is a delta loading option that dramatically speeds up the load after the initial one.
Replace deletes all target table’s rows that meet the filter criteria and re-loads rows that meet that criteria from the source table. For example, all data is loaded in the initial InvoiceHeader table extraction, and after it only adds/replaces rows where the control column values >= last extracted InvoiceDate.
When to Use
When there’s a single column identifying the order of the table’s row creation. Ex.: Document Date, or an always ascending Document Number.
To re-load all records for a recent period that might have records modified and/or deleted. Examples of expressions for reloading recent periods:
Last 7 days: DATEADD(DAY, - 7, GETDATE())
One calendar month ago: DATEADD(MONTH, - 1, GETDATE())
Last day of of the prior calendar month: EOMONTH(GETDATE(), - 1)
This calendar month: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 1))
Last two calendar months: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 2))
This calendar year: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0)
Last two calendar years: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 1, 0)
Do NOT use columns that can have their content updated such as LastModifiedDateTime columns. For tables with such columns, use Upsert below instead.
Select a column in the
Fielddropdown to function as the Replace filter. This column should identify the order of the table’s row creation.
Filter Valueif needed. By default, the filter value will be
Max(ColunmName). You may also customize the
Filter Valuewith values such as:
Constants. Ex.: ‘1/1/2022’
MS SQL expressions. Ex.: getdate()-30 to Replace all records for the past 30 days.
- Availability of a column to control the delta load subset. Example: a date or a sequentially growing value.
- The source table allows filtered data extraction.
- After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require a Load All load to sync up.
- Rows prior to the max control column value don’t change in the data warehouse. If that can happen, use Forced Load All (below) to capture modified or deleted records on a less frequent schedule.
If applicable, reads the Replace filter from the data warehouse table. In the screenshot above, this would be the maximum value of Invoice Date.
Delete all data warehouse rows where Filter Column >= Filter Value.
Load (append) all source table rows where Filter Column >= Filter Value onto the data warehouse table.
The load is done on a temporary table similar to Load All. When the load is completed, it’s appended to the target table.
Upsert is a delta loading option that dramatically speeds up the subsequent loads after the initial load. Upsert is a portmanteau – a combination of the words “update” and “insert.”
If a table doesn’t have data in the data warehouse yet, the first Upsert load will load all rows. Upsert can be added to a table at any time, before or after an initial load.
Upsert will update existing rows when their
Column with Last Modified Value have changed since the last Upsert. For instance, a row was loaded a month ago, and was changed yesterday.
Upsert will insert new rows from the source. This is defined by
Column with Last Modified Value is greater than the last value in the data warehouse plus these rows PKs are not found in the target.
This function requires the table’s Primary Key and a
Column with Last Modified Value.
Upsert does not delete rows with
Column with Last Modified Value < the last value imported into the data warehouse. For instance, on a daily upsert refresh, a deleted record from a week ago will not be deleted from the data warehouse.
Example. All data is loaded in the initial SalesOrderHeader table extraction, and after it only updates rows that have changed since the last load and inserts new ones.
When to Use. When there are existing records that can be updated overtime and there is a primary key and a last-modified column flag. Many modern source systems provide these features and this is the most efficient delta refresh process.
Column with Last Modified Value. Usually names a date-time column/field, but might also use other data types. Users often configure this incorrectly.
There’s a column flagging modified rows (ex.: Last Modified Date and Time).
The table has a primary key.
The source table allows filtered and indexed data extraction.
After the initial load, delta loads need columns and formats compatible with the data warehouse table.
Addition of new columns and/or increase in column formats require one Load All load to sync up.
Already loaded records are not deleted. Use Forced Load All (below) to capture deleted records on a less frequent schedule.
Read the max value of the Last Modified Value from the data warehouse table.
Load all source table rows where Last Modified Value column >= max value into a temporary table.
Delete all rows from the data warehouse table where temporary table primary key = data warehouse table primary key.
Load (append) the temporary table into the data warehouse table.
Use this setting when you want to append the source to the data warehouse table.
If you want to add a DateTimeStamp column, either use a field from the source, or apply a transformation in the SQL Statement (if supported), or use the Design page to set up that transformation (getdate()).
Append source table rows into the data warehouse table.
To add a DateTime stamp on each append, and the source table not having such column, create a dummy column and use the ETL+ Design page to assign it GETDATE() value.
Job → Forced Load All
Forced Load All on a Refresh Batch guarantees that its tables will have their data completely re-loaded, ignoring preconfigured delta refreshes.
One can use this feature manually to guarantee the reload of all data at will, or set up scheduled refreshes in conjunction with delta refreshes. For instance, run Replace and/or Upsert on a frequent schedule (ex.: hourly), and Forced Load All in a less frequent schedule (ex.: nightly).
For instance, run Replace and/or Upsert loading every hour, and Forced Replace All once every night. With this setting, already extracted rows that are modified (Replace misses modified records) or deleted on a day (Replace and Upsert miss deleted records) will be captured overnight.
How to configure this:
Set Replace and/or Upsert for the desired table(s) - see ETL+ Extract, Transform and Load (ETL) Page
Make a job on the ETL+ Job Page to run the default load (delta) on a frequent schedule.
Create a new job on the ETL+ Job Page to run with Forced Load All:
select the table(s) that use Load type options Replace and/or Upsert on the Job Steps panel of the ETL+ Job Page.
Set this job Forced Load All
schedule this job to run on a less frequent schedule.