{
    "componentChunkName": "component---src-templates-blog-template-js",
    "path": "/blog/how-we-decreased-pipeline-time",
    "result": {"data":{"markdownRemark":{"html":"<p>At Loadsmart, we use <a href=\"https://circleci.com/\">CircleCI</a> as a pipeline tool to assert the new code’s correctness and ensure high-quality code before integration.</p>\n<p>Some time ago, we realized that the pipeline for one of our largest codebases was taking 45min on average to be finished. Of those, 22min running the backend tests only.\nIt means that, for each PR, developers would wait around 45min to get feedback on their changes, delaying the time to deliver code to production.</p>\n<p>We have decided to create a Working Group to improve this. This post aims to share our journey to reduce the total execution time of our backend tests.</p>\n<h2>What is a Working Group?</h2>\n<p><a href=\"https://github.com/loadsmart/culture/blob/main/practices/working-groups.md\">Working Groups</a> are a great format when it comes to solve multi-squad tech debts, as they are short-lived, temporary, diverse, and they are formed to solve a cross-organization issue. Cool, right? Would you like to know more about Loadsmart Engineering culture? <a href=\"https://engineering.loadsmart.com/blog/our-engineering-culture\">Check it out</a>.</p>\n<p>In this initiative, the team was formed by three full-stack engineers, two backend engineers, and one site reliability engineer. All of them from different squads.</p>\n<h2>Goals</h2>\n<p>As with all Loadsmart initiatives, it should have clear goals and be based on data to be properly tracked by everyone.</p>\n<p>For this specific effort, we came across the following goals:</p>\n<ul>\n<li>\n<p>Reduce average execution time of backend tests by 50%</p>\n<ul>\n<li>How to measure: CircleCI backend tests step execution time</li>\n</ul>\n</li>\n<li>\n<p>Reduce average time from code-ready to code-deployed by 50%</p>\n<ul>\n<li>How to measure: CircleCI backend tests step + QA deploy + Staging deploy + Production deploy</li>\n</ul>\n</li>\n</ul>\n<h2>Methodology</h2>\n<p>We followed a scientific method for problem-solving based on the following steps:</p>\n<ul>\n<li>Identify a problem:\nWe may have several potential issues in our code impacting the backend test step execution time.\nHowever, we should pick one to focus on. In this step, we took several problems and guessed which one of them would be better to start with.</li>\n<li>Research:\nAfter having a clear definition of the problem, we started doing some research to understand how it was being handled by the software community.</li>\n<li>Hypothesis:\nBased on the results of previous steps, we could gather some hypotheses in the format of \"what if...\".\nFor example, \"what if we change the Django default test runner?\".</li>\n<li>Experimentation:\nFor each hypothesis, we have evaluated the effort to test it. After qualifying it as a valid one, we prototyped it.\nWe did it for a minimal set of changes enabling an initial evaluation. Our goal here was to change minimal and get the max result possible.</li>\n<li>Results and conclusions:\nOnce we had a prototype working as expected, we started gathering data. If it showed promising results, we could invest more effort in it.</li>\n</ul>\n<h2>Actions and Results</h2>\n<h3>Baseline</h3>\n<p>As a baseline, we considered the total execution time for the backend test step: 22 minutes.</p>\n<ol>\n<li>Map and split slowest Django apps</li>\n</ol>\n<p>We had a bunch of tests spread through several Django apps.</p>\n<p>We came up with the following hypothesis: What if we split the faster/slower apps into different jobs?</p>\n<p>We started by measuring the execution time for each Django app. Then, we created manually 2 test jobs: the first would have the slowest tests and the second would have the fastest. We tried to split it equally based on total execution time, ie, the second job with faster tests would have more tests than the first.</p>\n<p>We got the following results showing an improvement of about 7 minutes</p>\n<ul>\n<li>Job 1 - 14 min 35 s</li>\n<li>Job 2 - 14 min 15 s</li>\n</ul>\n<p>It showed a valid path which was splitting tests to run in parallel. However, breaking them into multiple jobs was not feasible, since we'd keep adding more and more tests daily.</p>\n<ol start=\"2\">\n<li>No migration before backend tests steps</li>\n</ol>\n<p>When we were measuring closer the tests execution time, we realized that the pipeline was taking about 8 minutes to run Django migrations (more than 1400 migration files) since we had a lot of databases and cross-references.</p>\n<p>Since we were using <a href=\"https://pytest-django.readthedocs.io/en/latest/index.html\">pytest-django</a>, which takes the current model definition to be used as an in-memory database, we could remove the migrations steps.</p>\n<p>By removing them, we saw a bunch of tests failing. It turned out that we were using Django migration for data migration too. To fix it, we created several pytest fixtures to provide required data instead of inserting them into the database.</p>\n<p>By the end of this step, we could get an improvement of 8 minutes in the total pipeline execution time.</p>\n<ol start=\"3\">\n<li>Blocking external calls</li>\n</ol>\n<p>Our code had a lot of dependencies with external services. During the previous test investigations, we realized that some of them were not properly mocking those external calls.</p>\n<p>We could confirm it by using <a href=\"https://github.com/miketheman/pytest-socket\">pytest-socket</a>. We had 3 different types of external calls:</p>\n<ul>\n<li>External APIs calls (port 80 or 443): 113/15685 test cases</li>\n<li>ElasticSearch calls (port 9200): 176/15685 test cases</li>\n<li>Redis cache calls (port 6389): 11132/15685 test cases</li>\n</ul>\n<p>Before investing time to fix those mocks, we first disabled all those tests to validate whether it was a valid investment of our time.\nExternal API and ElasticSearch calls proved to be worth the investment. For Redis calls, the impact in the codebase could be huge and the reduction in time we would get wouldn't be significant.</p>\n<p>After properly mocking them, we could see an improvement of 35% in time by mocking 1.84% of the tests.</p>\n<p>In order to prevent this from happening again, we configured <a href=\"https://github.com/miketheman/pytest-socket\">pytest-socket</a> in our codebase.</p>\n<ol start=\"4\">\n<li>Upgrade dependencies (including pytest)</li>\n</ol>\n<p>When we were adding <a href=\"https://github.com/miketheman/pytest-socket\">pytest-socket</a>, we noticed that some of the project dependencies were outdated.\nHypothesis: Using the latest project dependencies version could have performance improvements?</p>\n<p>It was a huge effort to update them all and make sure that everything was working as expected. For example, by upgrading pytest from version 2 to version 6, we had to change some decorator orders, like <code class=\"language-text\">@freeze_time</code> needed to be placed after the <code class=\"language-text\">@pytest.fixture</code> decorator.</p>\n<p>By the end of this effort, we could not see any performance improvement. However, it was really valuable to have all dependencies updated, since the latest version could include performance and security fixes.</p>\n<ol start=\"5\">\n<li>Running tests in parallel</li>\n</ol>\n<p>After all these tentatives, we went back to the initial investigation results. By looking at CircleCI documentation, we found the feature of <a href=\"https://circleci.com/docs/2.0/parallelism-faster-jobs/\">running tests in parallel</a>.</p>\n<p>CircleCI provides the option to split tests by timing, which was exactly what we were looking for. By doing so, we had our big moment: the test step decreased from about 22m to 6m.</p>\n<p>However, <a href=\"https://coveralls.io/\">Coveralls</a> started failing, since each job had a portion of coverage data. We had to combine multiple coverage files into a single one:</p>\n<div class=\"gatsby-highlight\" data-language=\"yaml\"><pre class=\"language-yaml\"><code class=\"language-yaml\">    <span class=\"token comment\"># ...</span>\n    <span class=\"token punctuation\">-</span> <span class=\"token key atrule\">run</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">name</span><span class=\"token punctuation\">:</span> Run Python tests\n    <span class=\"token key atrule\">command</span><span class=\"token punctuation\">:</span> <span class=\"token punctuation\">|</span><span class=\"token scalar string\">\n        TEST_FILES=$(circleci tests glob \"head/**/test_*.py\" | circleci tests split --split-by=timings)\n        pytest --no-migrations --junitxml=build/report_${CIRCLE_NODE_INDEX}.xml --cov-append --cov=head --cov-config .coveragerc --cov-report term --cov-report xml --capture=no -n 15 $TEST_FILES</span>\n    <span class=\"token punctuation\">-</span> <span class=\"token key atrule\">run</span><span class=\"token punctuation\">:</span> <span class=\"token punctuation\">|</span><span class=\"token scalar string\">\n        mv build/coverage.xml build/coverage_${CIRCLE_NODE_INDEX}.xml\n        mv .coverage build/.coverage_${CIRCLE_NODE_INDEX}</span>\n    <span class=\"token punctuation\">-</span> <span class=\"token key atrule\">store_test_results</span><span class=\"token punctuation\">:</span>\n        <span class=\"token key atrule\">path</span><span class=\"token punctuation\">:</span> build\n    <span class=\"token comment\">#...</span>\n    <span class=\"token punctuation\">-</span> <span class=\"token key atrule\">run</span><span class=\"token punctuation\">:</span>\n    <span class=\"token key atrule\">name</span><span class=\"token punctuation\">:</span> Publish coverage\n    <span class=\"token key atrule\">command</span><span class=\"token punctuation\">:</span> <span class=\"token punctuation\">|</span><span class=\"token scalar string\">\n        coverage combine build/.coverage_*\n        coveralls</span>\n    <span class=\"token comment\">#...</span></code></pre></div>\n<h3>Bonus action:</h3>\n<p>CircleCI also provides a great feature to <a href=\"https://circleci.com/docs/2.0/docker-layer-caching/\">cache the docker layer</a>. It may give you some more time improvements.\nIn our case, we were already using it.</p>\n<h2>Lessons learned</h2>\n<ul>\n<li>Tech debts should not be ignored and fixes are WORTH the investment.</li>\n</ul>\n<p>In this particular large codebase, we had several tech debts that were found by this Working Group. Some of them we were able to fix, but this is something we as engineers should keep our eyes on, to avoid it existing for that long. As time goes by, it may become costly to pay those debts.</p>\n<ul>\n<li>\n<p>Test levels</p>\n<p>We should be careful about different test levels. We should be aware of the <a href=\"https://martinfowler.com/articles/practical-test-pyramid.html\">Test Pyramid</a> to have a clear definition of the test boundaries. Should it mock external dependencies? Should we test things altogether (integration tests)?</p>\n<p>Automated tests with external dependencies are slower than others. That is why it is really important to have a clear and well-defined Quality Strategy. What are the tools, practices, and processes we have to make sure that we are delivering good code?</p>\n</li>\n<li>\n<p>Try to keep dependencies as updated as possible</p>\n<p>Our software evolves as time goes by, such as libraries. Based on this, we should constantly evaluate the tradeoffs of upgrading dependencies.</p>\n</li>\n</ul>\n<p>By the end of this Working Group, we achieved our expected goals: backend tests step execution time running in about 6 minutes (73% of improvement).</p>\n<p>Even better than that, we had a great moment to reflect on our practices and raise improvement items.</p>\n<p>Like to solve challenges like this one? We have many open positions at the moment. Check out our <a href=\"https://github.com/loadsmart/culture\">engineering culture</a> and the <a href=\"https://loadsmart.com/careers/\">careers page</a>.</p>","excerpt":"At Loadsmart, we use CircleCI as a pipeline tool to assert the new code’s correctness and ensure high-quality code before integration. Some time ago, we realized that the pipeline for one of our largest codebases was taking 45min on average to be finished. Of those, 22min running the backend tests only.\nIt means that, for each PR, developers would wait around 45min to get feedback on their changes, delaying the time to deliver code to production. We have decided to create a Working Group to…","frontmatter":{"date":"February 02, 2022","path":"/blog/how-we-decreased-pipeline-time","title":"How we decreased the execution time of our backend test suite","comments":true,"author":"Gustavo Rodrigues"},"fields":{"readingTime":{"text":"8 min read"}}}},"pageContext":{}},
    "staticQueryHashes": ["63159454"]}