Quantcast
Viewing all articles
Browse latest Browse all 19

Answer by fuweichin for String length in bytes in JavaScript

Here is an independent and efficient method to count UTF-8 bytes of a string.

//count UTF-8 bytes of a stringfunction byteLengthOf(s){//assuming the String is UCS-2(aka UTF-16) encodedvar n=0;for(var i=0,l=s.length; i<l; i++){var hi=s.charCodeAt(i);if(hi<0x0080){ //[0x0000, 0x007F]n+=1;}else if(hi<0x0800){ //[0x0080, 0x07FF]n+=2;}else if(hi<0xD800){ //[0x0800, 0xD7FF]n+=3;}else if(hi<0xDC00){ //[0xD800, 0xDBFF]var lo=s.charCodeAt(++i);if(i<l&&lo>=0xDC00&&lo<=0xDFFF){ //followed by [0xDC00, 0xDFFF]n+=4;}else{throw new Error("UCS-2 String malformed");}}else if(hi<0xE000){ //[0xDC00, 0xDFFF]throw new Error("UCS-2 String malformed");}else{ //[0xE000, 0xFFFF]n+=3;}}return n;}var s="\u0000\u007F\u07FF\uD7FF\uDBFF\uDFFF\uFFFF";console.log("expect byteLengthOf(s) to be 14, actually it is %s.",byteLengthOf(s));

Note that the method may throw error if an input string is UCS-2 malformed


Viewing all articles
Browse latest Browse all 19

Trending Articles