Thursday, May 22, 2008

Non Xml Chracters and JAVA

To hell with the non xml characters. Pissing me off and wasting a lot of my time finally found the answer from who else but google. Strange is the importance of google in a techie's life to the point I have started to address it as a person.
       Crawled stuff from net in spite of all the standards of the world yields in data and encoding from planet Mars. Then a poor guy from this earth like me whose boundaries has been defined inches away from his skin has to battle it out with this unknown enemy in not so data friendly territory of JAVA. Throwing exception and kicking me off from the battle with knockout punches I finally found the chick in its armor. The solution was simple, chose your battle, if you don't know the enemy simply avoid it :)


public static String stripNonValidXMLCharacters(String in) throws Exception{
     StringBuffer out = new StringBuffer();
     char current;
     if (in == null || ("".equals(in))) return "";
     for (int i = 0; i < in.length(); i++) {
         current = in.charAt(i);
         if ((current == 0x9) ||
         (current == 0xA) ||
         (current == 0xD) ||
         ((current >= 0x20) && (current <= 0xD7FF)) ||
         ((current >= 0xE000) && (current <= 0xFFFD)) ||
         ((current >= 0x10000) && (current <= 0x10FFFF)))
         out.append(current);
     }
     return out.toString();
}

No comments: