MySQL

MySQL: Binary Log Transaction Compression

6 min read

8 months ago

Introduction

MySQL 8.0.20 introduced a binary log transaction compression feature designed to save storage space and reduce network bandwidth usage by applying compression to each transaction. This feature can be enabled dynamically without restarting the server by setting the binlog_transaction_compression system variable to ON, as it is OFF by default. Additionally, the compression level of the zstd algorithm can be adjusted using the binlog_transaction_compression_level_zstd variable. By default, MySQL uses a compression level of 3, balancing resource usage and compression ratio. Detailed information on the implementation can be found in work log WL#3549.

Limitations

While MySQL's binary log transaction compression offers several benefits, it does not apply to all types of events. Excluded events include:

  • Non-transactional engines.
  • Transactions that end up in incident events.
  • Compressed transactions cannot exceed (MAX_ALLOWED_PACKET - bytes used for event header) bytes of compressed payload.
  • Only ROW-based format transactions are compressed.

Monitoring

The effects of binary log transaction compression can be monitored using the Performance Schema table binary_log_transaction_compression_stats. This table provides various statistics, such as the data compression ratio for the monitored period and the impact of compression on the server's most recent transaction. The table can be truncated to reset the statistics. For example:

mysql> SELECT * FROM performance_schema.binary_log_transaction_compression_stats\G *************************** 1. row ***************************                             LOG_TYPE: BINARY                     COMPRESSION_TYPE: ZSTD                  TRANSACTION_COUNTER: 5149             COMPRESSED_BYTES_COUNTER: 297060723039           UNCOMPRESSED_BYTES_COUNTER: 878445078063               COMPRESSION_PERCENTAGE: 66                 FIRST_TRANSACTION_ID: 79311bb8-161a-11ef-8838-061f91832189:79606   FIRST_TRANSACTION_COMPRESSED_BYTES: 57695187 FIRST_TRANSACTION_UNCOMPRESSED_BYTES: 170604987          FIRST_TRANSACTION_TIMESTAMP: 2024-05-20 15:03:50.059050                  LAST_TRANSACTION_ID: 79311bb8-161a-11ef-8838-061f91832189:84754    LAST_TRANSACTION_COMPRESSED_BYTES: 57709187  LAST_TRANSACTION_UNCOMPRESSED_BYTES: 170604987           LAST_TRANSACTION_TIMESTAMP: 2024-05-22 01:43:30.939051

In this example, the compression ratio is around 66%, demonstrating significant storage savings.

Readyset

We are pleased to announce that starting at Readyset version stable-240523 users can benefit from MySQL Binary Log Transaction Compression support. As part of introducing support in Readyset, we also extended MySQL rust driver to be able to decode compressed events. This work has been submitted as a contribution to the driver and has been released to the public in the rust driver version v0.31.0.

Test Scenarios

To test the potential savings of compressing binary logs, we conducted tests on two different EC2 instances: a smaller instance (t2.medium) with 2 CPUs and 4GB of RAM, and a larger instance (c4.4xlarge) with 16 CPUs and 30GB of RAM. The steps to reproduce this test are as follows:

1) Create a large table and populate it: 

CREATE DATABASE IF NOT EXISTS test; USE test; DROP TABLE IF EXISTS `large_test_table`; CREATE TABLE IF NOT EXISTS `large_test_table` (    `i` bigint(11) NOT NULL AUTO_INCREMENT,    `s` TEXT DEFAULT NULL,    `t` datetime NOT NULL,    `g` bigint(11) NOT NULL,    PRIMARY KEY (`i`)  ) ENGINE=InnoDB DEFAULT CHARSET=utf8;   INSERT INTO large_test_table VALUES (NULL, uuid(), time(now()), (FLOOR(1 + RAND() * 60))); INSERT INTO large_test_table SELECT NULL, uuid(), time(now()), (FLOOR(1 + RAND() * 60)) FROM large_test_table; -- Repeat the insert statement multiple times to populate the table with a large number of rows.

2-) Run the workload from an application server:

while true; do mysql -usysbench -psysbench -h 54.157.135.13 -P 3306 -A test -e "update large_test_table SET s=HEX(SHA2(CONCAT(NOW(), RAND(), UUID()), 512)), t= time(now()), g= (FLOOR( 1 + RAND( ) *60 )) WHERE i < 500000;"; sleep 4; done

Case 1: Small Server (2 CPUs)

In the initial run, binary log compression was disabled. By utilizing mysqlbinlog, along with shell and Perl scripts, we can count the number of transactions within a single binary log. In the example below, we can see that a single binary log contains 2,123,891 transactions.

# mysqlbinlog --base64-output=decode-rows -vvv binlog.000210 |perl -ne 'BEGIN{%r = ();};/^### ((?:DELETE|INSERT|UPDATE).*)$/ and $r{$1}++;END{foreach my $key (keys %r){print($r{$key}," ",$key,"\n");}}'|sort -nr|head -n 200 2123891 UPDATE `test`.`joinit`

Network utilization was approximately 23GB:

And CPU usage was around 22%:

After enabling binary log compression, the number of transactions in a single binary log increased to 5,764,847, representing a 270% increase:

root@ip-172-31-50-161:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows -vvv binlog.000223 |perl -ne 'BEGIN{%r = ();};/^### ((?:DELETE|INSERT|UPDATE).*)$/ and $r{$1}++;END{foreach my $key (keys %r){print($r{$key}," ",$key,"\n");}}'|sort -nr|head -n 200 5764847 UPDATE `test`.`joinit`

We also observed a 40% reduction in network usage:

While CPU usage increased from 22% to 39%:

Case 2: Large Server (16 CPUs)

With the larger server, we repeated the same steps. In the first run, the binary log contained the same number of transactions:

root@ip-172-31-50-161:/var/lib/mysql# mysqlbinlog --base64-output=decode-rows -vvv binlog.000649 |perl -ne 'BEGIN{%r = ();};/^### ((?:DELETE|INSERT|UPDATE).*)$/ and $r{$1}++;END{foreach my $key (keys %r){print($r{$key}," ",$key,"\n");}}'|sort -nr|head -n 200 2123891 UPDATE `test`.`joinit`

The network usage was stabilized at 63GB/hour:

The CPU usage was stable at 3.85%:

In the second run with binary log compression enabled, we observed a similar increase in the number of transactions stored in the binary log, as anticipated:

# mysqlbinlog --base64-output=decode-rows -vvv binlog.000730 |perl -ne 'BEGIN{%r = ();};/^### ((?:DELETE|INSERT|UPDATE).*)$/ and $r{$1}++;END{foreach my $key (keys %r){print($r{$key}," ",$key,"\n");}}'|sort -nr|head -n 200 5764847 UPDATE `test`.`joinit`

Network usage was reduced by 68%:

CPU usage showed a slight increase:

Results


Below are the charts summarizing these numbers:

Summary

Binary log compression demonstrated significant savings in disk and network usage. The compression gains in storage size were consistent due to the same algorithm and workload. However, network performance varied because the increased CPU load reduced the capacity to process the workload. Additionally, servers with larger capacity benefited more from compression as they experienced less CPU impact. These observations may explain Oracle's decision to leave binlog_transaction_compression disabled by default.

Users utilizing Binlog Transaction Compression on MySQL server are now able to see the same benefits of reducing network usage when using Readyset.