Recently, I came across an interesting use case to fetch tweets from Twitter and post it to AEM instance. Since tweets content can be huge and this use case is similar to that of migration, involving extraction, transformation, and loading of the content, I decided to use an ETL tool called Talend.
Talend is a leading open source integration software provider. It has set of Twitter components pack based on twitter4j library which can help you connect to both twitter API (in case you want to gather past tweets to build a big dataset, though you can fetch tweets from last one week only at max, as mentioned in Twitter Developer Documentation) and twitter stream API (streams the live tweets).
We just need to define the queries in “tTwitterStreamInput” or “tTwitterInput” component based on our requirement of tweet source then we can post these result to AEM using HTTP post request.
Below are few simple steps that can be followed to fetch the tweets and post it to the AEM instance:
Step 2: Install Twitter components pack in your Talend instance. These components are not available in Talend by default. You’ll have to download and install them manually. Follow the instructions from this link to install the components in Talend.
Step3: Restart your instance and create a new job. Drag tTwitterOAuth, tTwitterStreamInput , tJavaRow and tTwitterOAuthClose from the components palette. Connect them as per the below diagram:
Select the tTwitterOAuth component, which provides the connector facilities to authenticate against a Twitter App using Twitter OAuth authentication system and fills the fields with the strings you got from your Twitter App API keys page. In this context, API and Consumer are synonyms. You can choose the connection type here based on your requirement.
Write your query in the tTwitterStreamInput component, create a schema and do the column mapping of Output.
You can limit the number of tweets you want in one job in this component.
Connect the output to the tJavaRow in which a custom code can be written to post the data to the AEM instance.
I had created a nt:unstructured node corresponding to each tweet in JCR through HTTP post request.
Alternatively, you can use tHttpRequest component to post the tweets to your AEM instance.
Select the connection to be closed in tTwitterOAuthClose component. It will close the connection on completion of sub-job.
Hope you find the blog helpful !!!