Technology

Normalization Forms for Accented Characters in Java

Text Normalization is the process of “standardizing” text to a certain form, so as to enable, searching, indexing and other types of analytical processing on it. Often working with large quantities of text we encounter character with accents like é , â etc. Unicode provides multiple ways to create such characters . For example we […]

Sachin
Sachin
Read

Technology

Normalizing Accented Words

We all often need to work on data aggregated together from different sources, and before we analyse it, we often need to normalize it to a certain standard, A normalization process typically includes removing special characters, converting all text to lower case , We can also have certain rules that words like “saint” will always […]

Sachin
Sachin
Read