Optimizing a Massive CI/CD Pipeline: From 35 to 5 Minutes

You push a critical hotfix, switch branches—and your CI is still running 30 minutes later. In my current project, our backend pipeline ran across a massive monorepo (~7,000 unit, ~600 integration, ~150 API E2E tests, ~200MB vendor) — and it bloated to a painful 35+ minutes.

It was killing productivity. While the team was frustrated, no one had the time to dive into the runner infrastructure. My roots are in Linux sysadmin work, and I’ve always carried that 'tinkerer' DNA into my software engineering. I don't stop at "it works" — I want to understand the full execution path and how it interacts with the OS and the hardware.

Driven by that urge to look under the hood, I decided against throwing more AWS resources at the problem. Instead, I dug into what was actually slowing things down. This isn’t a full how-to. It’s a practical look at what actually moved the needle — and what didn’t — when we cut our pipeline down to ~5 minutes.

1. Bypassing AWS EBS Limits with RAM Disks (`tmpfs`)

Initially, I suspected our test databases were the main I/O bottleneck. But a quick look at our EC2 metrics revealed the truth: the CPU was barely breaking a sweat, while disk I/O was completely maxed out. The real killer? Logs and temporary file processing. We deliberately run our CI tests in debug mode so developers get a full stack trace instantly upon failure.

Writing massive debug logs and processing large volumes of temporary files on EBS under concurrent load was grinding the runners to a halt.

The fix: I mounted both the MySQL data directories and our application's temporary storage directly to RAM disks. To squeeze every last drop of performance, I also bypassed MySQL's data safety mechanisms (since we don't care about data loss if a CI container crashes) and added a size limit to avoid OOM kills by the kernel.

# docker-compose.ci.yml
services:
  database:
    image: mysql:8.0
    command: --innodb_flush_log_at_trx_commit=0 --sync_binlog=0
    tmpfs:
      - /var/lib/mysql:rw,noexec,nosuid,size=1G
  app:
    image: my-app:test
    tmpfs:
      - /app/var/logs:rw,size=500M
      - /app/var/cache:rw,size=500M

Caveat: This works perfectly as long as your runners are not oversubscribed. tmpfs shifts the bottleneck from I/O to memory capacity. We did hit OOM issues early on when running too many concurrent jobs — proper memory limits per container turned out to be mandatory. Otherwise, you’re just trading slow pipelines for unstable, randomly crashing ones.

What surprised me the most was how skewed the gains were — tmpfs alone accounted for the majority of the speedup. This wasn’t an incremental improvement — it was the turning point that made the rest of the optimizations actually matter.

The Trade-off:

Lost: Durability (which we don't care about in CI).
Gained: Massive I/O speed for logging and DB operations.

2. The Artifact Trade-Off and S3 Caching

GitLab’s default GIT_STRATEGY: clone forces every concurrent job to clone the entire repository. In a massive polyglot monorepo — where pulling the code means downloading other unrelated microservices, frontend apps, and heavy assets just to run backend tests — this is brutal.

I refactored the pipeline so only the first setup job fetches the code (using GIT_DEPTH: 1). Here is where I made a conscious architectural decision: I configured the job to install PHP dependencies and package the exact state (vendor/) into a GitLab artifact.

Wait, isn't passing vendor as an artifact an anti-pattern? In massive Node/PHP projects, yes. The artifact can weigh 1GB and zipping/unzipping it takes longer than a fresh install. However, in our specific context, our vendor footprint was manageable. This is a classic case where not going strictly by the book is the better choice. Building a full Docker image just for CI dependencies would have added build time and maintenance overhead without solving the real bottleneck. Combined with migrating our GitLab cache and artifacts to a dedicated AWS S3 bucket, pulling the vendor artifact with high bandwidth and low latency was significantly faster than hitting external package registries across 10 concurrent jobs.

Pro-tip: The Authoritative Classmap. Instead of a standard install, I combined dependency resolution with autoloader optimization. By using --classmap-authoritative, I forced Composer to generate a static class map and stop any further filesystem lookups for missing classes. In a large-scale application, this eliminates thousands of redundant I/O operations during test execution.

# .gitlab-ci.yml
setup_and_build:
  stage: build
  variables:
    GIT_DEPTH: 1
  cache:
    key:
      files:
        - composer.lock
    paths:
      - vendor/
  script:
    # One command to rule them all: install + authoritative classmap
    - composer install --prefer-dist --no-progress --classmap-authoritative
  artifacts:
    paths:
      - vendor/
    expire_in: 1 hour

Downstream jobs (like parallel test suites or static analysis) no longer touch Git or external registries. They just download the optimized artifact from the build job and execute immediately. In practice, this removed an entire class of network-bound variability from the pipeline.

The Trade-off:

Lost: Clean, from-scratch dependency isolation per job.
Gained: Bypassing network bottlenecks and redundant Git/Composer operations.

3. The Parallelization Trap (And Why Paratest Failed Us)

A common piece of advice for slow PHP pipelines is: "Just use Paratest to run it in parallel!" We tried. We shaved off a few seconds from our unit tests, but integration tests crashed and burned.

Why? Because parallel execution exposes the sins of legacy test architecture. Parallelization isn’t a performance optimization—it’s an architectural test. Our integration tests lacked proper state isolation and collided over database records. True parallelization requires architectural changes: wrapping tests in database transactions that rollback automatically, or dynamically provisioning isolated database schemas per test thread. If your tests rely on shared state, parallelization doesn’t make them faster — it just makes them fail faster.

4. Honorable Mention: Watch Your Docker Limits

As we sped things up and ran more tests concurrently using our RAM disks, we hit a bizarre wall. It manifested as intermittent EMFILE ("Too many open files") errors during peak parallel load.

Our Docker daemon on the self-hosted AWS runners was still using the default file descriptor limit (ulimit -n 1024). Instead of waiting for an external DevOps team to manually edit the daemon.json on the EC2 instances, I implemented the fix directly via Infrastructure as Code in our compose file.

# docker-compose.ci.yml
services:
  app:
    # ... image and tmpfs config ...
    ulimits:
      nofile:
        soft: 65536
        hard: 65536

This was a good reminder: once you remove one bottleneck, the next one is usually just beneath it — often at the OS level.

5. OPcache in CLI and the JIT Dilemma

Usually, OPcache is something you tune for your web servers and completely ignore in CLI environments. But since we already had our blazing-fast tmpfs RAM disks set up, I decided to run an experiment.

We run three different test suites sequentially within a single CI job. I configured PHP to dump its OPcache directly into the RAM disk (opcache.file_cache=/tmp/.ci-opcache) and enabled it for CLI. The result? The first suite primed the cache, giving the second and third suites a significant cold-start boost.

But here is where it gets interesting: the JIT trade-off.

Naturally, I tried enabling PHP 8's JIT compiler. In our environment, this significantly reduced the benefit of the OPcache file cache. While standard opcodes can still be persisted to the file cache, PHP's JIT-compiled machine code cannot. To be strict with facts: the JIT buffer lives entirely in shared memory and is tightly coupled to the runtime environment. Unlike opcodes, the actual machine code is not dumped to the OPcache file cache, so it has to be regenerated on each run.

Since CLI runs spawn separate processes, the file cache becomes the only practical way to reuse compiled opcodes across test processes. JIT effectively neutralized this advantage — the machine code still has to be regenerated in every process. Our integration and API E2E tests are heavily I/O-bound rather than CPU-bound, so the instant cold start from the RAM-backed file cache delivered a much bigger win than JIT ever could.

However, you can't just turn on OPcache in a massive project and call it a day. You have to fine-tune it, keeping in mind that with a RAM disk, you are paying with RAM twice (once for the active OPcache memory, and once for the storage in /tmp/.ci-opcache). Here is our configuration. (Note: I'm showing this in php.ini format for readability, but in our pipeline, we just passed these as -d flags directly to the PHP CLI command).

# Custom php.ini for test jobs
opcache.enable=1
opcache.enable_cli=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=64
opcache.max_accelerated_files=60000
opcache.validate_timestamps=0
opcache.enable_file_override=1
opcache.save_comments=1
opcache.file_update_protection=0
opcache.jit=off
opcache.jit_buffer_size=0

Why these specific values?

Capacity (max_accelerated_files=60000 & interned_strings_buffer=64): Big projects mean tons of files, especially in vendor/. If you have 40k files, you have 40k namespaces and class names. You need enough buffer space to hold that string map. PHP internally adjusts this value to a nearby prime number, so instead of chasing an exact value, I opted for a safe upper bound.
Memory (memory_consumption=256): I didn't just pull this number out of thin air. I calculated it by checking the actual size of the /tmp/.ci-opcache dump for our project, adding the buffer size, and leaving a safe headroom. Don't just set this to some massive number randomly, or you will exhaust your runner's memory fast.
Killing I/O (validate_timestamps=0): This is crucial. In CI, your code is idempotent. It doesn't change during the run. Disabling timestamp validation stops PHP from wasting I/O to check if files were modified.
Instant caching (file_update_protection=0): This disables the safety delay for caching new files. We want our fresh CI files cached instantly without any grace periods - this is just a precaution, even though files extracted from the artifact are likely older than the default 2-second threshold.

💡 The "Sanity Check" Moment: The Relative Path Trap During the final rollout, we hit a few No such file or directory errors. My first thought? "It must be the OPcache configuration." The reality? The optimization simply exposed a technical debt in our test bootstraps. A recent directory shift combined with hardcoded relative paths (../../../../) meant our tests were looking for files in the system root.

The lesson: High-performance optimizations like enable_file_override=1 and classmap-authoritative require strict path discipline — ideally absolute paths. Fixing our paths resolved the issue in our case, but it’s worth noting that enable_file_override allows OPcache to short-circuit filesystem lookups and rely entirely on its internal cache. This behavior can conflict with tools like Composer, which expect standard checks (like file_exists()) to reflect the real filesystem state, not OPcache’s internal lookup table. If you run into unexplained filesystem issues, this setting should be one of the first things to revisit.

Side note on static analysis: I also tested OPcache and JIT on our PHPStan jobs. The gains were basically within the margin of error. PHPStan 2.x is already incredibly well-optimized, and since we reuse the PHPStan result cache via GitLab artifacts anyway, it was already flying. A good reminder that not every tool needs low-level engine tweaks if it already does its own caching well.

The Trade-off:

Lost: JIT CPU optimizations.
Gained: Instant cold-starts across multiple test suites via a RAM-backed file cache.

6. Removing Cruft & The Infamous `sleep(1)`

Lastly, I audited the code itself. We removed operations that only made sense for local development (like seeding default developer accounts).

More importantly, I found a hardcoded sleep(1) in a PHP class - a legacy band-aid for a concurrency issue. In production, a 1-second delay in a queue-consuming worker might be invisible. But in the test suite, running across 60 iterations with mocked data? That's 60 seconds of pure, wasted waiting.

Conclusion

Cutting a pipeline from 35+ to around 5 minutes rarely comes from a single silver bullet. It’s about combining infrastructure pragmatism (tmpfs, smart caching, bypassing OS limits) with a deep audit of what your application is actually doing under the hood.

But the real business value? Throughput and infrastructure costs. Since our GitLab runners are self-hosted on AWS EC2, we pay for instance uptime, not serverless compute seconds.

Metric	Before	After
Pipeline Time	35 min	5 min
Pipelines / Hour (Per Node)	1.7	12.0
Node Capacity Increase	-	~700%

This allowed us to drastically scale down the total number of EC2 instances required to handle the engineering team's daily load, completely eliminating CI queue times during peak hours.

The biggest takeaway? CI performance problems are rarely about compute — they’re about eliminating unnecessary I/O, redundant work, and the hidden bottlenecks in your stack.

The final result? A lower AWS bill, a snappy 5-minute feedback loop, and a much happier engineering team.

How I Cut Our CI/CD Pipeline from 35 to 5 Minutes (And Saved the Team's Sanity)

1. Bypassing AWS EBS Limits with RAM Disks (`tmpfs`)

2. The Artifact Trade-Off and S3 Caching

3. The Parallelization Trap (And Why Paratest Failed Us)

4. Honorable Mention: Watch Your Docker Limits

5. OPcache in CLI and the JIT Dilemma

6. Removing Cruft & The Infamous `sleep(1)`

Conclusion

Comments

Command Palette

1. Bypassing AWS EBS Limits with RAM Disks (tmpfs)

2. The Artifact Trade-Off and S3 Caching

3. The Parallelization Trap (And Why Paratest Failed Us)

4. Honorable Mention: Watch Your Docker Limits

5. OPcache in CLI and the JIT Dilemma

6. Removing Cruft & The Infamous sleep(1)

Conclusion

Comments

1. Bypassing AWS EBS Limits with RAM Disks (`tmpfs`)

6. Removing Cruft & The Infamous `sleep(1)`