On the Size of a String

July 21, 2011 § Leave a comment


In computer programs, number constants can be interesting and bewildering things.  Trying to figure out why one was chosen over another can be really confusing.  For a good while I was confused as to why ActiveRecord would set a string attribute to be a VARCHAR(255) in the database.  It limits the size of the string attributes to 255 characters long.  256 is a bit more natural when choosing constants in computer science.  255 is commonly used to denoted the last index in a 0-based array of 256 elements.  So why 255?  The short answer is “because of InnoDB and UTF-8 character sets.”

InnoDB has a limitation on the size of a key for a single column index of 767 bytes.  When the table is encoded with a UTF-8 character set, each characters has the possibility of using 3-bytes to represent its intended character.  That means in order to be able to fully index a UTF-8 encoded varchar column, the string must be able to be represented in 767 bytes.  767 / 3 = 255 2/3.  This means that the largest UTF-8 encoded varchar column can be 255 characters long, hence the ActiveRecord default string attribute size.

Problems on the Way

As bigger and bigger pushes are made for complete internationalization, we’ll see more things encoded with UTF-16 and UTF-32.  Characters using these encodings might require up to 4 bytes to represent their value.  When this happens, ActiveRecord will need to reduce the size of indexable string attributes to 191 characters.

For Fun

Here is a truly awesome magic number that seems to come out of no where, 0x5f3759d5.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading On the Size of a String at The On Blog.

meta

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: