Understanding storage sizes for MySQL TEXT data types
Posted by: AJ Welch
TEXT
data objects, as their namesake implies, are useful for storing long-form text strings in a MySQL database. The four TEXT
data object types are built for storing and displaying substantial amounts of information as opposed to other data object types that are helpful with tasks like sorting and searching columns or handling smaller configuration-based options for a larger project.
The different TEXT
objects offer a range of storage space from 1 byte to 4 GB and are not designed for storing computational values. It’s common to see these used to store product descriptions for a sales site, property summaries for realty database, and long-form article text on a news website. TEXT
objects are best used when VARCHAR
and other string-based data objects are insufficient to handle storing the desired amount of information. However, the smallest TEXT
type, TINYTEXT
, shares the same character length as VARCHAR
.
TEXT
objects differentiate themselves from other string storage types by removing the requirement to specify a storage length, not stripping bytes when selected, and do not pad unused character space for efficient disk storage. Since TEXT
objects are not stored in the server’s memory, they require data overhead for retrieval. The following sizes assume the database is using the UTF-8 encoding.
TINYTEXT: 255 characters - 255 B
The TINYTEXT
data object is the smallest of the TEXT
family and is built to efficiently store short information strings. This type can store up to 255 bytes (expressed as 2^8 -1) or 255 characters and requires a 1 byte overhead. This object can be used to store things like short summaries, URL links, and other shorter objects. TINYTEXT
shines over VARCHAR
when storing data that’s under 255 characters with an inconsistent length and no need to be used for sorting criteria.
TEXT: 65,535 characters - 64 KB
The standard TEXT
data object is sufficiently capable of handling typical long-form text content. TEXT
data objects top out at 64 KB (expressed as 2^16 -1) or 65,535 characters and requires a 2 byte overhead. It is sufficiently large enough to hold text for something like an article, but would not be sufficient for holding the text of an entire book.
MEDIUMTEXT: 16,777,215 - 16 MB
The MEDIUMTEXT
data object is useful for storing larger text strings like white papers, books, and code backup. These data objects can be as large as 16 MB (expressed as 2^24 -1) or 16,777,215 characters and require 3 bytes of overhead storage.
LONGTEXT: 4,294,967,295 characters - 4 GB
The LONGTEXT
data object is for use in extreme text string storage use cases. It is a viable option when the MEDIUMTEXT
object is not big enough. Computer programs and applications often reach text lengths in the LONGTEXT
range. These data objects can be as large as 4 GB (expressed as 2^32 -1) and store up to 4,294,967,295 characters with 4 bytes of overhead storage,
TEXT vs. BLOB
BLOB
s are an alternative type of data storage that share matching naming and capacity mechanisms with TEXT
objects. However, BLOB
s are binary strings with no character set sorting, so they are treated as numeric values while TEXT
objects are treated as character strings. This differentiation is important for sorting information. BLOB
s are used to store data files like images, videos, and executables.
Usage notes
- Using
TEXT
fields for select and search queries will incur performance hits because the server will call the objects individually and scan them during the query instead of paging data stored in the memory. - Enabling strict SQL will enforce the maximum character lengths and truncate any entered data that exceeds those limits.
TEXT
columns require an index prefix length and can’t haveDEFAULT
values, unlikeCHAR
andVARCHAR
objects.- Estimating size by word count: assume average English word is 4.5 letters long and needs 1 extra character for spacing. Example, a site that consists of 500 word articles would use about 2,750 characters on average for the article text data.
TINYTEXT
’s 255 character capacity is insufficient for this use case, whileTEXT
’s 65535 character capacity offers storage for articles that hit over 11,900 words based on the average criteria.