WordPress-DataBase-Router-Thing

The Intro

The Pain(s)

For all of WordPress’s vast ecosystem and celebrated diversity of choices and other spoils, the Plugins, for ~~better or~~ worse, recently unceremoniuosly joined by Themes in that effort, constantly write vast amounts of diverse data.

This turns the hardest working, most overlooked and underappreciated MVPs – the databases from a shiny machine to a squeaky bloated mess dragging us to depths of maintenance hell, with pissed off users and clients.

The above without a significant time investment, along with great care, diligence and TLC, ~~more or~~ less gradually evolves into an incomprehensible and dangerous (with 40-ish % of all currently alive sites powered by WP, probably more than ocassionally business-critical) mess, even for far beyond average WP operators.

When we eventually (or inevitably for people who like i’s in their words) get there after an UNO-reverse on a botched update, stumble upon an apparently also botched magic-spell effect on a DB index which now has a 200% overhead, to finally try to catch a Pokemon with a query as basic as SELECT 🤔 FROM … 🥹 FROMmmm … 😡 … no wait it’s actually FROM ….. 🤬 let alone a half-decent function, view, etc.
See, I didn’t even have to say #truestory bro

And a special place in hell for plugins creating 15 tables for the basic data with trivial diferences, those with unneccessarily chatty logging and “creative” analytics features recording 20MB/h on a slow day, who skipped a class right after “M for Meh” and came back just after “O for Optimize the damn thing”.

Joking aside, the reverse process is often possible. It’s also often seriously complex, tedious and expensive.

But what if there were a way to…

There isn’t. Hence this:

The Concept

But what if we… no seriously, what if we actually had a good start, or at least some help on the way.

I present you the WordPress DataBase Router Thing – to be used for data segregation, smart data placement and support for multiple independent data stores …

… or its more friendly form – The WPDBRT 😂

(Please don’t sue me Mullenweg – both for the unlicensed use of WP and likely misspelling your name)

The Attempt on a Remedy

Contextual Data Router – a database abstraction layer to selectively route archival, data from certain plugins to a separate database. This moves the performance overhead away from the “main” WordPress database, which is used for core content like posts, pages, and users.

It’s not a wet dream, hear me out and take a look at the:

The Technical Approach

It’s only not a wet dream thanks to this amazing guy John Blackbourn @johnbillion who, as smart as I wish I was, has already proven some of the difficult parts and technical challenges as solvable in his Query Monitor Plugin.

As the smallest thank you I could possibly do, a few words on the Query Monitor Plugin (which, by the way, is completely free, so this is not a shameless promo plug) – Purpose of the Query Monitor plugin is making it easy to identify which plugins, scripts and queries are causing performance issues, errors and other sins with a goal of helping with debugging or at least bringing the peace of knowing it’s not your fault and it’s not a feature it’s a bug. And who doesn’t hate bugs, especially the butteflies.

Why am I so thankful? Because he saved me from trying to reinvent the wheel and further reduced the initial difficulty as we could potentially leverage and take ~~inpiration from~~ his solutions in Query Monitor:

Related Hooks with filters or actions attached – the plugin monitors database queries, hooks, and actions, PHP, scripts, and their connections in great detail and more importantly with great precision.
Even more significantly The Symlinked DB component – db.php alleviates the most complex part (possibly a show stopper) that is the actual abstraction layer (not just a wrapper) for wp-db.php, which we could lean on to:
- Overcome the efficiently concerns in routing queries to a separate database by creating a symbolic link to an alternative wp-db.php file that handles the new database connection.
- Control queries with a custom class that extends WordPress’s core database class (wpdb). This allows to discover, capture and modify queries to divert them to another database.
Cherry on top and a gift that keeps on giving, Query Monitor is Open Soure and available on Github

(also thanks to Steve Jobs for making me a great artist!)

With the technical part put aside briefly, it’s time for the smart part.

Practical considerations, initial workflow & feature implementation ideas

The high-level architecture assumes three core components and exactly as many functional process stages:

Stage 1 – Spy and learn

This is the intelligence gathering phase – the Plugin would subscribe to plugin activation and deactivation events of other plugins.

Using the model of the Related Hooks monitors other plugins’ execution and queries they are performing.
Unique events would be recorded along with their key metrics – in terms of the database ops, those would be the quantity and diversity of database operations from schema complexity (number of tables, CPTs, object types, unique keys, relations, etc.) to frequency and amount of data writes.
This defines the main entities – plugins, their operations and relevant metrics.
Basic data filtering happens In this immediate phase – as we are only interested in database operations, non-DB related actions are discarded.

In order to reduce overhead and improve performance, I see a multi-tiered approach as the most optimal – a listener, a filter, and a recorder:

Atomizing their operations reduces individual processing complexity and speeds up the workflow.
Secondary benefit is an extra buffering “space” in potential (expected) high volume of events.
Lastly, The sequential nature of the process creates an opportunity for parallelization and resource allocation/distribution by for example spawning or relocating “workers” toward the most congested step (Data Collection/Processing/Recording) to prevent data loss and operate in resource-constrained environments and periods.

This process is “intense” and resource heavy in bursts – in the initial phase of a new plugin when its arsenal of events is introduced – they need to be captured and processed in all three stages. As they become business as usual – reducing the amount of “unique” events, the “recording” step will be increasingly less needed and practically omitted. This is the best scenario as at this point I anticipate it being the slowest – for example, getting the data from RAM to a buffer (object storage?) and recording it to a disk (database).

Saving grace is that installing and activating a new plugin is a “significant event” in and of itself – some slowdown may be expected or at least consideration by the user of introducing such changes to the system in off-peak hours, provides an opportunity for this stage to be “greedy”, as that’s when the bulk of the processing happens – as a burst of new events to be processed and some events “unique” happen during the setup – like creating the DB tables for the new plugins, initial data writes, etc. Which need to be processed and provided to the downstream stages for their functional decision making.

Stage 2. Setting up the Abstraction Layer – The Clandestine Twin Operator

Instead of reinventing the wheel, we’ll rely on the Symlinked DB feature to create a symlinked database file.
This custom class is the “route” for queries in the “router”. It extends the WordPress’s core database wpdb class. Technically it’s located in front of wpdb and provides the same type of “services” as wpdb – passing data to and from either of the data stores.
It’s a core functional foundation of the router, but essentially just a transport layer, “transparently” handling the data by intercepting and/or diverting it, as fundamentally the interacting function doesn’t care where the data ends up or comes from. It’s a grunt – doesn’t require much logic, just fast reflexes and high throughput.

Stage 3. Developing the Routing Logic – The Chess Player

Goals and tasks of the Router logic are to:

Monitor functions and hooks (actions, filters) for database query requests (intents).
Inspect the requests and query strings (maybe even backtrace) to determine if the query is coming from one of the plugins we want to divert.
Intercept Queries. If a query matches the criteria – routes the Data, i.e. executes the query against the secondary database (a secondary or an external MySQL database). It is the “brain” and decision maker for the Router per se.
it should be able to “learn” – but strictly in terms of pattern matching, no room (resources) and time (actual miliseconds) for any kind of intelligence, to adjust and adapt to changing data or usage.
As its namesake, the network router, it should have its own dedicated persistent memory for storing its rules and “maps” – where a particular database table or a piece of data is.

This was actually the first use case that sparked this whole idea – for a single plugin to have its tables and/or data (e.g. only certain keys) placed in a different databases – for instance the actual data thats useful, frequently needed or processed along with other data in one DB and logs and generally “Tier 2” in another.

Benefits of such approach are aplenty:

Not mixing critical data with archival data and junk single-use metadata in a single table – I’m looking at you post_meta, wp_options, user_meta.

Reducing the size, operational complexity (indexes, functions, views) and either better performance or lower resource demands of the “main” database.

Even if the router is not smart at all on its own and requires manually creating the rules and routes (which will most likely be the case in the first version), the effort is still beneficial, even in the mid-term – in performance, manageability, and maintainability.

The end goal, of course, being to alleviate and eventually prevent the need for user’s interactions, analysis, manual operations, with a wet dream in it migrating the data between two tables if it recognizes a benefit in it.

Clean Up – by knowing exactly which queries and database tables, keys, etc. a plugin created an or used, it can provide the tools or at least the means to easily and safely clean up any leftover data and tables, also the code like functions and filters scattered across functions.php and system files, usually left behind as ghosts of the plugins past.

Since it’s already tapped into the other plugins’ lifecycle events – install, activate, deactivate, uninstall; in the long-term it could potentially include and automate that process – like the Revo Uninstaller, but for WordPress.

Alternative configurations of the Router could unlock additional use cases – for instance if instead of as a secondary data store it is configured to function as a parallel one, with real-Real-time data duplication and immediate consistency compared to “eventual consistency” of the traditional replication – for time-sensitive data (benefiting from reduced temporal discrepancy between different records), redundancy for important data (e.g. transaction records), additional bandwidth for high volume read-write operations, reliable mutual consistency for multi-party operations, or simple data duplication for backup (JBODB 😂).

Again trying not to reinvent the wheel and aware that some of the proposed solutions and/or mentioned use cases are not entirely novel and available in MySQL as Sharding, Federated tables, and many types of replication. Yet! From the perspective of WordPress vs. an enterprise environment with the goals and intentions (beyond helping ourselves) to democratize these methods and enable use cases to people who may not have Giga Chad chops for complex database designs and setups, but an Average Joe navigating the inherent reality of a diverse spectrum of hosting providers (to put it politely) in which even some of the core MySQL methods are limited or disabled, same goes for locked PHP configurations and modules.

With that out of the way, let’s tackle:

The WordPress Database Router Architecture | The Outline

The architecture for diverting queries and data from the WordPress database is based on the custom database abstraction layer as a core component. Instead of replacing WordPress’ entire database system, this layer acts as an intelligent router, intercepting and directing database queries to either the main WordPress database or a secondary one.

The foundation of this architecture is built upon the Symlinked DB approach and a custom wp-db.php extending the WP’s core wpdb class. Doing so allows all database queries to be funneled through our workflow before they are executed.

Key Methods and Concepts

Intercepting Queries

The method that intercepts every query sent by WordPress will have access to the full query string and if processing time/resource permitted the PHP backtrace – the sequence of function calls that led to the query.

Method:
A query() method within the custom class would be overridden to perform checks on a query.

              ┌───────────────────────┐
              │     query($query)     │
              └───────────┬───────────┘
                          │
                          ▼
         ┌───────────────────────────────────┐
         │  Check condition: should_divert?  │
         └───────────────────┬───────────────┘
                             │
              ┌──────────────┴───────────────┐
              │                              │
              ▼                              ▼
 ┌───────────────────────┐       ┌───────────────────────────┐
 │   YES → Divert query  │       │   NO → Run parent::query  │
 │   to secondary DB     │       │   on main WP database     │
 └──────────┬────────────┘       └───────────┬───────────────┘
            │                                │
            ▼                                ▼
 ┌───────────────────────┐       ┌───────────────────────────┐
 │  return divert_query  │       │   return parent::query    │
 └───────────────────────┘       └───────────────────────────┘

Identifying Queries to Divert

Diverting core WordPress queries is avoided by implementing a simple logic function – an allow-list ruleset omits the the system queries and disqualifies the unspecified functions. This also reduces the overhead of excess evaluation of passing queries.

Method:

The should_divert($query) method could use a variety of checks:

Query String Analysis: Check for specific table names (e.g., wp_redirection_logs, wp_options) or specific keywords (e.g., INSERT, DELETE) associated with data-heavy plugins.

Backtrace Analysis: Analyze the backtrace to identify plugin or theme function which originated the query.
This would the most reliable approach – an information-rich and precise way to evaluate the query.
I’ts been validated IRL and exists in the db.php symlink feature, but its scope there is far less demanding compared to the database operations workflow we’re planning, hence still unsure if feasible. (Currently TBD & TB tested) – as it’s supposed to be the way I like my sex: fast and cheap on resources.

Hook Analysis: Intercept actions and filters before the query is even constructed to divert data before it reaches the database.

Diverting the Data

Once a query has been identified for diversion, the system connects to a secondary database and executes the query there. This can be a separate MySQL/MariaDB database or even a lightweight file-based database like SQLite.

Method:

A divert_query($query) method would handle this process:

Connect to Secondary Database: Establish a new database connection specifically for diverted queries.

Execute the Query: Run the query against the secondary database.

Return a Mock Result: the results and responses of the diverted qeury need to be identical to the origninal response structure to prevent the functions and plugins from throwing an error, as if the query was run on the main database. For example the INSERT queries would typically include a true value and a an insert ID.

Conclusion: The Unlikely Future of the “Un-Bloated” WordPress

The unhealed revolutionary in me would say that WPDBRT wasn’t just about moving a few logging tables to a secondary database; instead about reclaiming the resource & performance sovereignty of our WordPress sites.

By treating the database as a distributed resource rather than a single, monolithic point of failure, we move away from the “maintenance hell” toward a modular, scalable data architecture.

We are essentially applying microservices logic to the most legacy part of the WordPress stack.

The “kind of a” Roadmap:

The Alpha Pilot: Tested the should_divert() logic against high-velocity plugins (like Redirection or basic Analytics) to measure the actual milliseconds overhead of the router itself.
The SQLite Edge: Exploring if “Tier 2” archival data could be routed to a local SQLite file instead of a second MySQL instance, potentially offering a “zero-config” performance boost for users on budget hosting.
The Fail-Safe Protocol: Refining the “Mock Result” system to ensure that if the secondary database goes offline, the Router can gracefully degrade or temporary cache queries without crashing the entire site.

The wheel is already round—thanks to the real-deal guys like John Blackbourn – I’m just trying to make it steer in two directions at once.

I wish I could say Matt, if you’re reading this: my bags are packed, and my db.php is symlinked., but unfortunately, it’s not up to us, and we are exposed to the very same rich and diverse ecosystem of even more rich and diverse approaches to implementation of a bunch of stuff in a purely arbitrary matter – making this kind of solution (even idea) completely expose and fragile, not really fit for high-performance real-world use.

But hey, who doesn’t like a good mental exercise of not sleeping for a few nights to chase an elusive idea 🤷🏻

Tense Cortex