notes on SQL data compression

be very afraid – i just went to edit this post – and the original was totally gone.  WTF.

There is going to be a webinar on data compression by Jes Schultz-Borland of Brent Ozar’s group, and i wanted to get a little background from Brad M. McGehee

All the good stuff, i mean the CRUDy stuff on MS SQL server occurs in RAM in memory.  The rest is stored on physical disk.  These files as well as the files created by backup, can not live on a compressed windows drive, but the files themselves can be compressed.

Data compression is table by table and can be done at the row or page level.  When the Database engine wants something it can’t find, it asks the Storage engine to get it.  Storage looks first in data cache and later to the disk  for the data, and then if necessary reads it from the disk.  This disk I/O is slow and also involves moving the data from the input to the disk cache and into memory for the Database engine.  If the data is compressed, there is less physical data to move.  Weighted against the additional requirement for the CPU to get involved, frequently the movement of the smaller file(s) results in better performance. Also the pages in cache will be smaller which means more data in the same cache space.  We might be able to get along with smaller, cheaper drives, or if we have SSD disks, smaller or better use of it.

Row-level data compression is done for each row of data, and occurs by taking fixed-length data types and storing them in variable lengths of storage.  This finds and used wasted space.  It can not be done for some datatypes, such as nvarchar() and varchar() (my favorites).  In practice the mdf get smaller on disk, the datacache is smaller and the SQL pages of 8K in size may hold more rows of data so that memory is conserved.

Page-level data compression starts with row compression on the page, then continues with both prefiz and

 

—————-

 

data analysis occurs in memory in MS SQL Server. Data compression occurs either when creating a backup or in data stored on the data as a mdf file. It does not affect the processing of data in MS SQL, however it can save space and moving smaller files can speed up I/O some processes and speed up queries. it takes CPU resoruces and will slow write-speed.

Data compression of files (table stored as heap, as clustered index, non-clustered index, indexed view or partitioned tables and indexes) can be set by the DBA at the row level or page level. Each of the file types and each of the files are compressed individually.

Compression is of the file types on disk and in the buffer cache and is done by the storage engine. When MS SQL relational engine needs the data, the storage engine looks for it the cache or on the disk. If necessary, it reads it from the disk and copies it to memory in the data cache. There, it is unncompressed as it is ‘handed off’ to the relational engine. Savings in i/o sometimes make it faster than fetching uncompressed data. After the activity, the data is released back to the storage engine, which compresses it to the cache (which saves buffer cache space), and eventually it is written back to disk, already in the compressed form.

Row compression – does not store null or zero (blank?) values, stores fixed length data types as variable and as small as possible (bigint, char) and reduces the amoung of metadata (data loss?).

page level compression takes more CPU than row compression – row compression to stuff as much data into a page, AND the prefix is compressed (common at the being of a column are stored in the compression information structure (CI)) and dictionary compression run over the entire page (DC – repeated values are stored once int he CI and referenced repeatedly).

Space savings vary with the actual data.

You can use SSMS\Storage\compression to test how much space would be saved in a table with row or page compression. The wizard can run immediately, create an agent job, or generate a t-sql script. The compression can be CPI and I/O intensive. Also use sp_estimate_data_compression_savings (http://sqlfool.com/2011/06/estimate-compression-savings/)
http://www.bradmcgehee.com/2010/03/an-introduction-to-data-compression-in-sql-server-2008/

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.