Getting Started with QueryPilot in a Self-Hosted Environment
This page details the configuration of Readyset QueryPilot when running in a self-hosted environment.
If you are curious about Readyset QueryPilot, and would like to try it out in your own environment, please contact us at hello@readyset.io!
Workflow
Readyset QueryPilot automatically executes the following steps:
- Lock a file on disk (configured by
lock_file
) to prevent multiple instances of Readyset QueryPilot from executing concurrently. - If
operation_health_check = true
, querymysql_servers
and check all servers that havecomment LIKE '%Readyset%'
(case insensitive) andhostgroup = <id of readyset_hostgroup>
in the ProxySQL configuration. For each Readyset instance, fetch its current status and update its ProxySQL status accordingly:Online
- Adjust the server status toONLINE
in ProxySQL.Maitenance Mode
- Adjust the server status toOFFLINE_SOFT
in ProxySQL.Snapshot In Progress
- Adjust the server status toSHUNNED
in ProxySQL.
- If
operation_query_discovery = true
, query the tablestats_mysql_query_digest
and find queries executed atsource_hostgroup
byreadyset_user
. Check if each query is supported by Readyset. The queries are considered in order of priority specified by the Query Discovery configurations. - If the query is supported, cache it in Readyset by executing
CREATE CACHE FROM <query>
. - If
warmup_time_s
is NOT configured, a new query rule will be added, redirecting this query to Readyset. - If
warmup_time_s
is configured, a new query rule will be added to mirror this query to Readyset. The query will still be proxied to the original hostgroup. - Once
warmup_time_s
seconds have elapsed since the query was mirrored, the query rule will be updated to redirect the query to Readyset instead of mirroring.
Configuration
Assuming you have your ProxySQL (opens in a new tab) already configured with MySQL you will need to create a new hostgroup and add Readyset to this hostgroup:
INSERT INTO mysql_servers (hostgroup_id, hostname, port, comment) VALUES (99, '127.0.0.1', 3307, 'Readyset');
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
NOTE: It's required to add Readyset
as a comment to the server to be able to identify it in Readyset QueryPilot.
To configure Readyset QueryPilot to run, execute the following on the ProxySQL admin interface:
INSERT INTO scheduler (active, interval_ms, filename, arg1) VALUES (1, 10000, '/usr/bin/readyset_query_pilot', '--config=/etc/readyset_query_pilot.cnf');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
Configure /etc/readyset_query_pilot.cnf
as follow:
proxysql_user
- (Required) - ProxySQL admin userproxysql_password
- (Required) - ProxySQL admin passwordproxysql_host
- (Required) - ProxySQL admin hostproxysql_port
- (Required) - ProxySQL admin portreadyset_user
- (Required) - Readyset application userreadyset_password
- (Required) - Readyset application passwordreadyset_database
- (Optional) - Readyset application databasesource_hostgroup
- (Required) - Hostgroup running your Read workloadreadyset_hostgroup
- (Required) - Hostgroup where Readyset is configurewarmup_time_s
- (Optional) - Time in seconds to mirror a query supported before redirecting the query to Readyset (Default0
- no mirror)lock_file
- (Optional) - Lock file to prevent two instances of Readyset QueryPilot from running at the same time (Default"/tmp/readyset_query_pilot.lock"
)operation_health_check
- (Optional) - Whether to perform health checks on Readyset instances, shunning them if they become unhealthy (Defaulttrue
)operation_query_discovery
- (Optional) - Whether to automatic query discovery, caching, and routing to Readyset (Defaulttrue
)operation_rules_from_caches
- (Optional) - Whether to bring the ProxySQL deep cache rules into alignment with whatever deep caches (previously created by Readyset QueryPilot) currently exist on Readyset, adding or dropping them as necessary (Defaulttrue
)operation_shallow_cache
- (Optional) - Whether to fallback to creating shallow caches if deep caching via Readyset is not supported (Defaulttrue
)number_of_queries
- (Optional) - Number of queries to cache whenoperation_query_discovery
is enabled (Default10
)query_discovery_mode
/query_discovery_min_execution
/query_discovery_min_row_sent
/query_discovery_wait_s
- (Optional) - Query Discovery configurations. The options are described in Query Discovery (Default"count_star"
/0
/0
/0
)shallow_cache_ttl_ms
- (Optional) - The desired TTL in milliseconds when using shallow caching (Default10000
)shallow_cache_size_mb
- (Optional) - The memory capacity to use for shallow caching (Default256
)denylist_path
- (Optional) - Path to the denylist file that QueryPilot will use to deny queries from being cached (Default/etc/readyset_query_pilot_denylist
)
Query Discovery
Below are a set of configuration options related to finding queries that are cacheable:
query_discovery_mode
: (Optional) - Mode to discover queries to automatically cache in Readyset. The options are described in Query Discovery Mode (Default"count_star"
)query_discovery_min_execution
: (Optional) - Minimum number of executions of a query to be considered a candidate to be cached (Default0
)query_discovery_min_row_sent
: (Optional) - Minimum number of rows sent by a query to be considered a candidate to be cached (Default0
)query_discovery_wait_s
- (Optional) - Only discover queries after ProxySQL has proxied queries for at least this long (Default0
)
Query Discovery Mode
The query_discovery_mode
setting can be customized to instruct Readyset QueryPilot how it should prioritize the queries it considers when automatically making caching decisions.
The options are:
-
"count_star"
- Total Number of Query Executions- Formula:
count_star = total_executions
- Description: This metric gives the total number of times the query has been executed. It is valuable for understanding how frequently the query runs. A high
count_star
value suggests that the query is executed often.
- Formula:
-
"sum_time"
- Total Time Spent Executing the Query- Formula:
sum_time = total_execution_time
- Description: This metric represents the total cumulative time spent (measured in microseconds) executing the query across all its executions. It provides a clear understanding of how much processing time the query is consuming over time. A high total execution time can indicate that the query is either frequently executed or is time-intensive to process.
- Formula:
-
"sum_rows_sent"
- Total Number of Rows Sent by the Query- Formula:
sum_rows_sent = total_rows_sent
- Description: This metric provides the total number of rows sent to the client across all executions of the query. It helps you understand the query’s output volume and the amount of data being transmitted.
- Formula:
-
"mean_time"
- Average Query Execution Time- Formula:
mean_time = sum_time / count_star
- Description: The mean time gives you an idea of the typical performance (measured in microseconds) of the query over all executions. It provides a central tendency of how long the query generally takes to execute.
- Formula:
-
"execution_time_distance"
- Time Distance Between Query Executions- Formula:
execution_time_distance = max_time - min_time
- Description: This shows the spread between the fastest and slowest executions of the query (measured in microseconds). A large range might indicate variability in system load, input sizes, or external factors affecting performance.
- Formula:
-
"query_throughput"
- Query Throughput- Formula:
query_throughput = count_star / sum_time
- Description: This shows how many queries are processed per unit of time (measured in microseconds). It’s useful for understanding system capacity and how efficiently the database is handling the queries.
- Formula:
-
"worst_best_case"
- Worst Best-Case Query Performance- Formula:
worst_best_case = max(min_time)
- Description: The
min_time
metric gives the fastest time the query was ever executed (measured in microseconds). It reflects the best-case performance scenario, which could indicate the query’s performance under optimal conditions.
- Formula:
-
"worst_worst_case"
- Worst Worst-Case Query Performance- Formula:
worst_worst_case = max(max_time)
- Description: The
max_time
shows the slowest time the query was executed (measured in microseconds). This can indicate potential bottlenecks or edge cases where the query underperforms, which could be due to larger data sets, locks, or high server load.
- Formula:
-
"distance_mean_max"
- Distance Between Mean Time and Max Time (mean_time vs max_time)- Formula:
distance_mean_max = max_time - mean_time
- Description: The distance between the mean execution time and the maximum execution time provides insight into how much slower the worst-case execution is compared to the average (measured in microseconds). A large gap indicates significant variability in query performance, which could be caused by certain executions encountering performance bottlenecks, such as large datasets, locking, or high system load.
- Formula:
Denylist
Readyset QueryPilot supports a denylist mechanism to deny specific queries from being automatically cached. The default denylist file path is /etc/readyset_query_pilot_denylist
, but this can be changed with the denylist_path
configuration setting.
Each line specifies a new denylist rule. Queries can be denied based on database, their ProxySQL digest, and whether or not eligible for deep or shallow caching.
An example denylist file is below:
# Each line in the denylist defines one deny rule for QueryPilot. Comments can
# be specified with the # character.
#
# A single rule can deny a specific database, digest, cache type, or any
# combination thereof.
#
# To compose a deny rule, write one or more space-delimited key=value pairs.
# These pairs are then AND'd together to define a single deny rule. The pairs
# can be introduced in any order.
#
# The following keys are valid:
#
# - db # Specifies a database
# - digest # Specifies a ProxySQL digest
# - type # Specifies the cache type (deep or shallow)
#
# Some examples are below:
# Deny all caches in database db1
db=db1
# Deny shallow caches in database db2
db=db2 type=shallow
# Deny this digest everywhere
digest=0x2153C80B94EC555E
# Deny deep caches for this digest in database db4
db=db4 digest=0x2153C80B94EC555E type=deep