Drupal Data Migration with Migrate API

29 / Mar / 2023 by Amrit Pal Singh 0 comments

The Migrate API provides services for migrating data from a source system to Drupal 9.

Migration is an Extract, Transform, Load (ETL) process. In the Drupal Migrate API the :

  • Extract phase is called the source
  • Transform phase is called the process
  • Load phase is called the destination

It’s critical to realize that although in a typical Drupal context the term “load” refers to loading data from storage, in ETL the term “load” refers to putting data into the store.
A set of data, referred to as a row, is retrieved from the data source during the source phase. The information can be retrieved from a web service, loaded from a file (such as CSV, JSON, or XML), or moved from a database (for example, RSS or REST). The row is forwarded to the processing stage, where it is either modified as necessary or flagged for skipping. The changed row is transferred to the destination phase after processing, where it is loaded (saved) into the intended Drupal site.

We will start with a basic example, creating nodes using migration API for article content type. In the next blog, I will show how to migrate data from CSV file.

  • The example below assumes that the Article content type has a field field_image which accepts PNG files.
  • This example demonstrates how the define the destination directory where the image files will be downloaded.
  • The example uses the embedded_data source plugin for simplicity.

Writing the migration definition file

First, Create a custom module called migration_example

migration_example.info.yml file

The modules listed in the dependencies should be installed first. Core module migrates and Contrib modules migrate_plus, migrate_file, migrate_tools.

Now,  our article_migration.yml will go into the migrations folder

Code

This file has three main parts source, process, and destination.

The key ‘id’ it is recommended to have the same name as the file name. This key serves as an internal identifier that Drupal and the Migrate API used to execute and keep track of the migration. ‘label’ should be a human-readable string used to name the migration in various interfaces.

In the source, we have the `id` value that should be alphanumeric characters. To keep the example simple, we are using the plugin embedded_data. It allows you to define the data to migrate right inside the definition file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate.

For key ‘constants

You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the source section, they are independent of the particular source plugin in use. You can set as many constants as you need. The value can be set to anything you need to use later.

You assign source columns to node fields and properties in the process section. The names of entity properties or the field’s machine names serve as the keys. In this instance, we are setting values for the node’s “body” field and “title” field. The content type configuration page, located at “/admin/structure/types/manage/page/fields,” lists the field machine names. Values can be copied straight from the source during the migration or changed using process plugins. In this illustration, the values are copied exactly from the source to the destination. It is not necessary for the destination property or field name to match the names of the columns in the source.

For migrating image, the source plugin provides a full URL(file key) for the file to be downloaded but the value could also be /path/to/foo.png or public://bar.jpg if the file is already present in your file system.

The image_import process plugin extends the file_import plugin. In addition to the configuration keys inherited from file_import process plugin, image_import has the following additional optional configuration keys.

  • Alt: The alt attribute for the image.
  • Title: The title attribute for the image.
  • Width: The width of the image.
  • Height: The height of the image.

We are utilizing the ‘entity:node’ plugin for the destination, enabling you to construct any content type nodes. The ‘default bundle’ key denotes that all newly produced nodes will by default, be of type ‘Article’. It is significant to remember that the machine name of the content type is the value of the ‘default bundle’ key. It is accessible at “/admin/structure/types/manage/article.”

Machine names are typically used for the values in the Migrate API. We will highlight when and where to find the appropriate ones as we examine the system.

The final folder structure will be

Custom -> migration_example ->

migration_example.info.yml

migrations->article_migration.yml

YAML is a key-value format that allows for optional element nesting. It is extremely sensitive to indentation and white space. For instance, the colon symbol (:) that separates the key from the value must be followed by at least one space character. Furthermore, take note of the precise two spaces used to indent each level of the hierarchy. Incorrect YAML file space or indentation is a typical cause of issues while writing migrations.

Running the migration

Let’s use Drush to run the migrations with the commands provided by Migrate Run.

drush mim article_migration or drush migrate:import article_migration

if the command is run successfully, you should see it in the terminal. With that, migration is successfully complete.

[notice] Processed 3 items (3 created, 0 updated, 0 failed, 0 ignored) – done with ‘article_migration’

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *