ColdFusion: How do I get the byte size of a string?
I’m sure just about everybody who’s at least a little bit familiar with ColdFusion knows about the Len() function to get the character length of a string. What if you want to get the byte length or byte size of a string, though? There appears to be no built in method to do such a thing. I’ve seen posts say that, as long as you’re using ASCII character exclusively, then you can simply use Len() and assume that each character is only a single byte. But what if you have a mix of ASCII and non-ASCII characters? Today, UTF-8 is pretty much the defacto character encoding standard and can contain both non-Latin and Latin-based characters. This means it can contain Chinese, Japanese, Hebrew, Russian, and many other different types of character sets. When this happens, you cannot simply rely on Len() to be the same as the actual byte size of the string because most of these other kinds of characters are defined using anywhere from 1 to 4 bytes per character.
Fortunately, there’s a relatively easy way to get the byte size/length of a string by converting it into a byte array and then using the ArrayLen() method.
<!--- For some reason my WordPress blog doesn't like characters that are not Latin-based but pretend the following string contained something like Japanese characters ---> <cfset theString = "I contain characters that are not Latin-based"> <cfset theByteArray = theString.getBytes()> <!--- Converts the string into a byte array ---> <cfset numBytes = ArrayLen(theByteArray)> <!--- Array length of a byte array tells us how many bytes are in the string ---> |
Special thanks to Greg Ecklund for bringing the getBytes() method to my attention.