If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!

Monday, March 18, 2013

Normalizing Unicode

import java.text.Normalizer
/**
* Problem: Characters with accents or other adornments can be encoded in several different ways in Unicode
* However, from a user point of view if they logically mean the same, text search should make no distinction
* between the different notations. So it's important to store text in normalized unicode form. Code below shows
* how to check if text is normalized and how you can normalize it.
**/
object NormalizationTest {
def main(args: Array[String]) {
val text = "16-bit transceiver with direction pin, 30 Ω series termination resistors;"
println(text)
println(Normalizer.isNormalized(text, Normalizer.Form.NFC))
val normalizedText = Normalizer.normalize(text, Normalizer.Form.NFC)
println(normalizedText)
println(Normalizer.isNormalized(normalizedText, Normalizer.Form.NFC))
}
}
/**
* Output printed to console:
* -------------------------------
*
* 16-bit transceiver with direction pin, 30 Ω series termination resistors;
* false
* 16-bit transceiver with direction pin, 30 Ω series termination resistors;
* true
*/
view raw gistfile1.scala hosted with ❤ by GitHub

No comments:

Post a Comment