Until last post Validate Data Model by Studio Analysis, I completed creating ETL programs in Data Integration. But this is still test product because this programs are executed manually by operator. In this post, I would like to share the final piece of Data Integration, how to run ETL programs automatically after production release.
There are several ways to achieve automatic ETL, for example in my production system I am using scheduler in Machine Learning Workbench to trigger Jupyter Notebook, then operate Data Integration via Pycelonis (Python API for Celonis EMS).
[Read More]
Pay attention to Extract SAP Tables
Until last post I explained general topics of extraction task, adapted to all kind of source systems. Today I would like to focus on SAP ECC or S4HANA as source system and would like to tell you the SAP specific issues.
First issue is regarding source system itself. We would like to guarantee source system’s availability even if I connect Celonis EMS to that. So we may choose testing environment that is snapshot of production system, as source system that connect to Celonis EMS.
[Read More]
Use Pseudonymized Column as Grouping Key
One of the biggest headache for data engineer like me is how to assure data security when extracting data. Especially personal information should be dealt sensitively, otherwise I may be punished by each region’s law (e.g. GDPR).
When I operate Celonis EMS, I try not to extract sensitive information from the beginning, for example I do not extract table of customer address (ADRC table in SAP etc.). But this information is sometimes effective for grouping key of counting case etc.
[Read More]
Understand Delta Load Configuration Difference in Adding Column Scenario
Last time I showed behavior when I added new record then extracted that record by Delta Load (Verify Cloning Table Contents via Delta Load). Delta Load is effective way to minimize extraction effort, but it is not always applied. Today, it is continued from previous post, I would like to add column to cloned table and observe behavior of extraction task.
After starting system operation including database, normally system is changing its requirement and extend function and database etc.
[Read More]
Verify Cloning Table Contents via Delta Load
Following last week’s Minimize Extraction Time by Delta Load Option, today I would like to insert new record to Postgres table then try Delta Load again to extract it. To do this, I will start from operating pgAdmin, that is already ready for my loal machine after docker-compose.
First step is to enter localhost:5050 to my browser, then at the login screen enter pgadmin@celonis.cloud as email and pgadmin as password then click login button.
[Read More]
Minimize Extraction Time by Delta Load Option
Last week I extracted Postgres table and looked at the log to understand mechanism of data transfer. At that time I used Full Load option to extract data, that is to replace all table contents and schema to latest version. That is easiest way to synchronize tables between source system (Postgres) and Celonis, but it takes a lot of time to complete this task. So that I should also use second option Delta Load to minimize extraction time.
[Read More]
Look at Data Transfer Process by Data Job Log
Last week I posted Connect to Celonis and Bring Back Instruction to look at how Extractor works to connect between Celonis and Postgres. This week I would like to extract data from Postgres and look at data transfer process by data job log.
In the Data Integration, I create new Data Job with Data Connection I created last week, then create new extraction task. In the next screen I add new table public.
[Read More]