Advantages of an ETL Tool in Content Migration

24 / Feb / 2017 by Prabhdeep Puri 0 comments

Recently, I was introduced to a new technology called ETL (Extract, Transform and Load) which I can use to simplify and to increase the productivity of my daily tasks as AEM migration developer. There are various ETL tools available, some of them are listed here. I chose Talend, as it is built on top of Eclipse IDE and supports custom Java code.

Outlined below are some of the advantages almost all ETL tools have over writing custom migration scripts from scratch:

  1. Visual flow: All ETL tools provide a visual flow of the system’s logic which makes the migration process a lot more interactive than creating the scripts in a plain IDE. Generally, there is no manual coding required to create a basic Visual Flow Example
  2. Choice of Programming language: Whenever there is a need to write code, it can be written in a programming language of your choice. This totally depends on coding languages supported by an ETL tool. This makes the tool more comfortable to use and hence reduces the learning curve.
  3. Modularity: An ETL tool is completely based on independent modules which are like black boxes and are only concerned about the input they receive and the output they produce. You can very easily plug in/out, using drag and drop/delete option respectively without disturbing other modules.
  4. Performance: The optimized and performance tested modules help achieve a task in the least time possible.
  5. Easy understanding and maintenance: The click and drag programming snippets help achieve a basic task even if a person doesn’t know any programming language like append or replace some text in strings. The visual aspect helps in increasing understanding of the logic and in easy maintenance of the scripts.
    Here are a few examples:

    • A ‘split’ module will visually show multiple workflows.
      Example: Selecting one of the two different templates based on an input value is shown below. You can easily see the workflow for a Page Template.
      Talend Split Module
    • Mapping the same input to two different output formats at the same time.
      Example: Same input row generates an output XMLs. It is also logged in an excel file as shown below.
      Talend Multiple Outputs
    • Selecting output format based on an input or custom value.
      Example: Adding a selective condition to above example to log only those page URLs where a property named “Abstract” is either absent or empty in the Input as shown below.
      Talend Selective Outputs
  6. Productivity: Content management and migration is extremely challenging. Less boilerplate code and minimal ‘coding’ increases the productivity of the migration process. An end-to-end basic ETL process can be achieved very quickly by adding just the out-of-the-box modules with basic configurations. For an Example: No file handling or parsing code is needed to read/write any common file formats such as text, CSV, XML and Excel. Below image shows some of the many modules provided by default in Talend for taking input in the form of a file:
  7. Easy refactoring: Refactoring is a matter of replacing or adding a new input source box and a little configuration.
    Example: As shown in the below image, Changing the input source from XML to SQL requires replacing the input module from XML to SQL and minimal configurations if the schema is same.
    Talend Change Input

With an advancement in digital technologies and content centric approach for marketing, it is now mission critical for companies to use right web content management systems such as AEM or Drupal. Talend and many such ETL tools could be extremely useful in migrating content and we hope this blog was helpful to you in understanding ETL and Talend.


Leave a comment -