Technology

Normalization Forms for Accented Characters in Java

Text Normalization is the process of "standardizing" text to a certain form, so as to enable, searching, indexing and other types of analytical processing on it. Often working with large quantities of text we encounter character with accents like é , â etc. Unicode provides multiple ways to create such characters . For example we can...

by Sachin
Tag: Text Normalization
29-Jun-2012

Technology

Normalizing Accented Words

We all often need to work on data aggregated together from different sources, and before we analyse it, we often need to normalize it to a certain standard, A normalization process typically includes removing special characters, converting all text to lower case , We can also have certain rules that words like "saint" will always be...

by Sachin
Tag: Text Normalization
14-Jun-2012