Parsing tweet for Hashtags, Usernames and URLs in Java

24 / Sep / 2012 by Vishal Sahu 4 comments

Hi,

As i am working on Twitter integration in my current project, i needed to display the searched tweets from twitter API, on my view layer. When we query the Twitter API, it returns tweets text in the form of simple string which contains HashTags, Twitter usernames and links to external resources.


While displaying them on the UI layer, i wanted to have the proper links for every element like:-

#. Hashtags should be linked to search of tweets containing similar tags on twitter.
#. Usernames should link to the profile of that user on twitter.
#. External URLs should be connected to the resource it is pointing to.


I searched a lot and found various ways which are really useful if you want to parse the tweet on the server side and thought it worth sharing.
Below is the method which i used to parse tweet and returns the final html which i can use directly on my UI.

[java]

String parse(String tweetText) {

// Search for URLs
if (tweetText && tweetText?.contains(‘http:’)) {
int indexOfHttp = tweetText.indexOf(‘http:’)
int endPoint = (tweetText.indexOf(‘ ‘, indexOfHttp) != -1) ? tweetText.indexOf(‘ ‘, indexOfHttp) : tweetText.length()
String url = tweetText.substring(indexOfHttp, endPoint)
String targetUrlHtml= "<a href=’${url}’ target=’_blank’>${url}</a>"
tweetText = tweetText.replace(url,targetUrlHtml )
}

String patternStr = "(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)"
Pattern pattern = Pattern.compile(patternStr)
Matcher matcher = pattern.matcher(tweetText)
String result = "";

// Search for Hashtags
while (matcher.find()) {
result = matcher.group();
result = result.replace(" ", "");
String search = result.replace("#", "");
String searchHTML="<a href=’http://search.twitter.com/search?q=" + search + "’>" + result + "</a>"
tweetText = tweetText.replace(result,searchHTML);
}

// Search for Users
patternStr = "(?:\\s|\\A)[@]+([A-Za-z0-9-_]+)";
pattern = Pattern.compile(patternStr);
matcher = pattern.matcher(tweetText);
while (matcher.find()) {
result = matcher.group();
result = result.replace(" ", "");
String rawName = result.replace("@", "");
String userHTML="<a href=’http://twitter.com/${rawName}’>" + result + "</a>"
tweetText = tweetText.replace(result,userHTML);
}
return tweetText;
}
[/java]

The above code will return the tweet text wrapped in HTML elements to make it more UI friendly.


This worked for me.
Hope it helps.


Useful Links:-

https://dev.twitter.com/docs/tco-url-wrapper
http://stackoverflow.com/questions/8451846/actual-twitter-format-for-hashtags-not-your-regex-not-his-code-the-actual
http://www.simonwhatley.co.uk/parsing-twitter-usernames-hashtags-and-urls-with-javascript


FOUND THIS USEFUL? SHARE IT

comments (4)

  1. Danilo

    I’ve done a different implementation Here my code 🙂

    private String elaborateTwitterText(String text) {
    String newText = text;
    for (String key : new String[]{“#”, “@”, “http://”, “https://”}) {
    int findIndex = 0;
    int lastIndex = 0;
    while (findIndex != -1) {
    findIndex = text.indexOf(key, lastIndex);
    lastIndex = findIndex;
    if (findIndex != -1 && lastIndex < text.length()) {
    String tag = null;
    try {
    tag = text.substring(findIndex, text.indexOf(' ', lastIndex));
    } catch (StringIndexOutOfBoundsException e) {
    // No ' ' found so substringing till the end of the string
    tag = text.substring(findIndex);
    }
    switch (key) {
    case "#":
    newText = newText.replace(tag, "” + tag + “”);
    break;
    case “@”:
    newText = newText.replace(tag, “” + tag + “”);
    break;
    default:
    newText = newText.replace(tag, “” + tag + ““);
    break;
    }
    }
    lastIndex++;
    }
    }
    log.info(“Elaborated text: “+newText);
    return newText;
    }

    Reply
  2. Demetria

    Excellent blog here! Also youur web site loaxs up fast! What web host are
    you using? Cann I get your affiliate link to your host?
    I wijsh my website loaded up ass quicly as yours lol

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *