Twitter API Integration with AEM Using Talend

14 / Sep / 2017 by Pooja Chauhan 2 comments

Recently, I came across an interesting use case to fetch tweets from Twitter and post it to AEM instance. Since tweets content can be huge and this use case is similar to that of migration, involving extraction, transformation, and loading of the content, I decided to use an ETL tool called Talend.

talend twitter component in AEM

Talend is a leading open source integration software provider. It has set of Twitter components pack  based on twitter4j library which can help you connect to both twitter API (in case you want to gather past tweets to build a big dataset, though you can fetch tweets from last one week only at max, as mentioned in Twitter Developer Documentation) and twitter stream API (streams the live tweets).

We just need to define the queries in “tTwitterStreamInput” or “tTwitterInput” component based on our requirement of tweet source then we can post these result to AEM using HTTP post request.

Below are few simple steps that can be followed to fetch the tweets and post it to the AEM instance:

steps to fetch the tweets and post to AEM instance

Step 1: Create your own twitter application as you can not use the APIs anonymously. Follow the instructions from this link to do the same.

Step 2: Install Twitter components pack in your Talend instance. These components are not available in Talend by default. You’ll have to download and install them manually. Follow the instructions from this link to install the components in Talend.

Step3: Restart your instance and create a new job. Drag  tTwitterOAuth, tTwitterStreamInput , tJavaRow and  tTwitterOAuthClose from the components palette. Connect them as per the below diagram:

Job twitter migration http_post AEM

Select the tTwitterOAuth component, which provides the connector facilities to authenticate against a Twitter App using Twitter OAuth authentication system and fills the fields with the strings you got from your Twitter App API keys page. In this context, API and Consumer are synonyms. You can choose the connection type here based on your requirement.

twitter and talend authorization

I have used tTwitterStreamInput which gives only structured data as an output. You can also use tTwitterInput component, which provides whole JSON response from twitter API.

Write your query in the tTwitterStreamInput component, create a schema and do the column mapping of Output.

tTwitterStreamInput AEM twitter integration

schema of tTwitterStreamInput talend twitter example

You can limit the number of tweets you want in one job in this component.

tTwitterStreamInput  twitter talend components

Connect the output to the tJavaRow in which a custom code can be written to post the data to the AEM instance.

I had created a nt:unstructured node corresponding to each tweet in JCR through HTTP post request.

creating a nt:unstructured node  twitter input talend

Alternatively, you can use tHttpRequest component to post the tweets to your AEM instance.

Select the connection to be closed in tTwitterOAuthClose component. It will close the connection on completion of sub-job.

 tTwitterOAuthClose  how to install talend custom component

Hope you find the blog helpful !!!


comments (2)

    1. Pooja Chauhan

      What limit have you set in Adavance setting tab of tTwitterStreamInput?
      Job should get closed after reaching to that limit.


Leave a Reply

Your email address will not be published. Required fields are marked *