Performance in Jira front-end: solving bundle duplicates with Webpack and yarn

Performance in Jira front-end: solving bundle duplicates with Webpack and yarn

This is the third and final blog in our series on our recent work reducing the amount of JavaScript downloaded from the Jira cloud front-end (Read part 1 and part 2). This blog describes the work that was done to create a Webpack plugin to de-duplicate transitive dependencies. We found that this plugin delivers about a 10% reduction in bundle sizes across key pages, such as the Jira Issue view.

Setting the scene

Some terminology used in the story:

Direct dependencies: packages on which your project relies explicitly. Typically installed via yarn add package-name. The full list of those can be found in package.json in dependencies in the root of the project.

Transitive dependencies: packages on which your project relies implicitly. These are the dependencies your direct dependencies rely on. Typically you won’t see them in package.json, but they can be seen, for example, in a yarn.lock file.

Duplicated dependencies: transitive dependencies with mismatched versions. If one of the project dependencies has a button package version 4.0.0 as a transitive dependency and another has the same button version 3.0.0, both of those versions will be installed and the button dependency will be duplicated.

Deduplication: the process of elimination of duplicated dependencies according to their semver versions (x.x.xmajor.minor.patch). Typically, a range of versions within the same major version will contain no breaking changes and only the latest version within this range can be installed. For example, a button version 4.0.0 and 4.5.6 can be “deduplicated,” so only 4.5.6 version is installed.

yarn.lock file: an auto-generated file that contains the full list of all direct and transitive dependencies and their exact versions in yarn-based projects.

The problem with duplicated dependencies

“Duplicated” dependencies are inevitable in any of the mid-large scale projects that rely on npm packages. When a project has dozens of “direct” dependencies, and every one of those has its own dependencies, the final number of all packages (direct and transitive) installed in a project can be in the hundreds. In this situation, it is more likely than not that some of the dependencies will be duplicated.

In Jira, we have ~250 direct “production” dependencies (that number doesn’t include “dev” dependencies), that in total install 1,070 unique libraries, or 2,845 if we count duplicates.

Considering all those are bundled together and served to customers, it’s important to minimize the number of duplicates to reduce the final Javascript size. This is where the deduplication process comes into play.

Deduplication in yarn

Consider a project that, among its direct dependencies, has modal-dialog@3.0.0 and button@2.5.0, and modal-dialog brings button@2.4.1 as a transitive dependency. If left unduplicated, both versions of button will exist in the project:

and in yarn.lock we will see something like this:

modal-dialog@^3.0.0:
  version "3.0.0"
  resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
  dependencies:
    button@^2.4.1

button@^2.5.0:
  version "2.5.0"
  resolved "exact-link-to-where-download-2.5.0-version-from"

button@^2.4.1:
  version "2.4.1"
  resolved "exact-link-to-where-download-2.4.1-version-from"

Now, we know that if packages button@^2.4.1 and button@^2.5.0 are following server correctly, they are compatible. Therefore we can tell yarn to grab the same button@^2.5.0 version for both of those buttons – “deduplicate” them. From the project perspective, it will look like this:

and in the yarn.lock file, we’ll see this:

modal-dialog@^3.0.0:
  version "3.0.0"
  resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
  dependencies:
    button@^2.4.1

button@^2.4.1, button@^2.5.0:
  version "2.5.0"
  resolved "exact-link-to-where-download-2.5.0-version-from"

Deduplication in yarn – non-compatible version

The above deduplication technique generally works quite well. But what will happen if a project has non-semver deduplicatable transitive dependencies? If, for example, our project has modal-dialog@3.0.0, button@2.5.0 and editor@5000.0.0 as direct dependencies, and those bring button@1.3.0 and button@1.0.0 as transitive dependencies?

Using the same technique, we can deduplicate buttons from the 1.x.x version, and from the project perspective it will look like this:

In the yarn.lock file, we will see this:

modal-dialog@^3.0.0:
  version "3.0.0"
  resolved "exact-link-to-where-download-modal-dialog-3.0.0-from"
  dependencies:
    button@^1.0.0

editor@^5000.0.0:
  version "5000.0.0"
  resolved "exact-link-to-where-download-editor-5000.0.0-from"
  dependencies:
    button@^1.3.0

button@^2.5.0:
  version "2.5.0"
  resolved "exact-link-to-where-download-2.5.0-version-from"
  
button@^1.0.0, button@^1.3.0:
  version "1.3.0"
  resolved "exact-link-to-where-download-1.3.0-version-from"

Two versions of buttons are unavoidable, and in this case, there’s usually nothing we can do other than upgrade the versions of modal-dialog and editor to the versions that have button from 2.x.x range and can be deduplicated properly. Typically, in this case, we stop, say that our project has “two versions of buttons” and move on with our lives.

But what if we dig a little bit further and check out how exactly those two buttons are installed on disk and bundled in our code?

Duplicated dependencies install

When we install our dependencies via classic yarn or npm (pnpm or yarn 2.0 are not considered here), npm hoists everything that is possible up to the root node_modules. If, for example, in our project above both editor and modal-dialog have a dependency on the same deduped version of tooltip, but our project does not, npm will install it at the root of the project.

and inside the node_modules folder we’ll see this structure:

/node_modules
  /editor
  /modal-dialog
  /tooltip

Because of that, we can be sure that we only have one version of tooltip in the project, even if two completely different dependencies depend on slightly different versions of it – unless the versions are not semver compatible and can not be deduped that easily. Essentially, the situation in the project with buttons from the above will appear as follows (note that now there is also the top-level button@2.5.0, which cannot be deduplicated with the 1.3.0):

/node_modules
  /editor
    /node_modules
      /button-1.3.0
  /modal-dialog
    /node_modules
      /button-1.3.0
  /button-2.5.0

Even if dependencies are deduped on the yarn.lock level and we “officially” have only two versions of buttons in yarn.lock, every single package with button 1.3.0 as a dependency will install its own copy of it.

Duplicated dependencies and webpack

Behind the scenes, Webpack builds a graph of all your files and their dependencies based on what’s installed and required in your node_modules via normal node resolution algorithm. TL;DR: every time a file in editor does import Button from 'button', node will try to find this button in the closest node_modules, starting with the parent folder of the file the request appeared. The same goes for the modal-dialog. And then, from a Webpack perspective, this very final ask for the button will be:

Webpack is not going to check whether they are exactly the same; it will treat them as unique files and bundle them together. Our duplicated button just got double duplicated.

The “2 versions of button in the project” in final bundles can turn into “dozens of copies of the same button.”

Deduplication in Webpack – first attempt

Since those buttons are exactly the same, the very first question that comes into mind: is it possible to take advantage of that and “trick” webpack into recognise it?

Webpack provides a rich plugin interface with access to almost everything you can imagine (and to some things that you can not). At its core, most of its features are built with plugins as well, and it exports a lot of them for others to use.

One of those plugins is NormalModuleReplacementPlugin, which gives us the ability to replace one file with another file during build time based on a regular expression. And this is exactly what we need!

First, detect all duplicated dependencies by grabbing a list of all packages within node_modules and filter those that have node_modules in their install path more than once (basically all the “nested” packages from the yarn install chapter above), and group them by their version from package.json.

"@atlaskit/theme@9.5.1": [
  "/project/node_modules/@atlaskit/avatar-group/node_modules/@atlaskit/theme",
  "/project/node_modules/@atlaskit/datetime-picker/node_modules/@atlaskit/theme",
  "/project/node_modules/@atlaskit/locale/node_modules/@atlaskit/theme",
  "/project/node_modules/@atlaskit/textarea/node_modules/@atlaskit/theme",
  "/project/node_modules/@atlaskit/user-picker/node_modules/@atlaskit/theme"
],
"@atlaskit/tooltip@15.2.3": [
  "/project/node_modules/@atlaskit/avatar-group/node_modules/@atlaskit/tooltip",
  "/project/node_modules/@atlaskit/user-picker/node_modules/@atlaskit/tooltip"
],
"@emotion/core@10.0.17": [
  "/project/node_modules/@atlaskit/avatar-group/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/button/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/calendar/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/checkbox/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/datetime-picker/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/editor-common/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/editor-core/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/editor-json-transformer/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/emoji/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/flag/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/locale/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/lozenge/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/media-card/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/media-editor/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/media-picker/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/media-ui/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/media-viewer/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/mention/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/modal-dialog/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/profilecard/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/renderer/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/select/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/task-decision/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/textfield/node_modules/@emotion/core",
  "/project/node_modules/@atlaskit/user-picker/node_modules/@emotion/core",
  "/project/node_modules/@storybook/addon-knobs/node_modules/@emotion/core",
  "/project/node_modules/react-select/node_modules/@emotion/core"
],

Example from real data. Everything that is under the same version is a duplicate

Second, replace all encounters of the “same” package with the very first one from the list.

This solution works, and reduces bundle sizes in Jira by approximately 8%. The whole implementation is relatively simple – only about 100 lines of code – but we haven’t solved all our challenges quite yet.

Deduplication in Webpack – the non-determinism problem

While the solution above worked well, it had an unfortunate side effect: Webpack started to generate assets in a non-deterministic way. On every single re-build, it was either moving some pieces of those duplicated modules around or just generating new internal ids. Any possible explanation relating to the code from our side was eliminated quite fast. Something weird was happening within Webpack internals themselves.

Debugging what the hell is going on, understanding why, and releasing a solution that fixed it for good took a week-long deep dive into the internals of NormalModuleReplacementPlugin and Webpack itself. Here are some of our key findings:

Considering the potential impact of the solution plugin, it was extracted from Jira, published as a package, and published to npm.

Findings and other curiosities

The non-deterministic behavior is reproducible non-deterministically

If you try to reproduce the non-deterministic part on a small synthetic example, you most likely won’t be able to. We were only able to reproduce it in our toy repo when we imported the entire Jira Issue Editor to it. Just a combination of a few atlaskit components didn’t do it (we tried with modal dialog + button + tooltip + badge – theme and analytics will be duplicated a lot between those). Reducing the chunk size to a minimum to make Webpack split code, manual async imports also didn’t help.

Interestingly, the order of the hook that you need to listen to in order to override requests for files is always different, but the final assets on small examples (and without deduping, for that matter) are deterministic.

NormalModuleReplacementPlugin was not built for this purpose.

First of all, it executes the RegExp only on request property of the result and replaces only request as well (check out the source: https://github.com/webpack/webpack/blob/master/lib/NormalModuleReplacementPlugin.js). However, there are many more properties in the result object that contain information about the origin of the module (and in theory need to be replaced as well), one of which is context – where the request actually originated. And if a file is requested relatively, it will have a relative path in the request property as well.

{
  request: "./styled",
  context: "/project/node_modules/editor/node_modules/button"
}

The final “path” to the file is a resolution of both, and in order to properly detect duplicates, we need to watch them both and replace all the fields that have the duplicated information (including context), which NormalModuleReplacementPlugin does not do.

nevertheless, NormalModuleReplacementPlugin can actually be used if there is a need

Because not everything we said in the “finding” above is entirely correct, and what turned out to be enough, in the end, was to replace request with the resolved absolute path from both of those.

From this:

{
  request: "./styled",
  context: "/project/node_modules/editor/node_modules/button"
}

to this:

{
  request: "/project/node_modules/modal-dialog/node_modules/button/styled",
  context: "/project/node_modules/editor/node_modules/button"
}

nd Webpack is okay with it and able to correctly bundle it!

“naive” replace (string.startsWith) won’t work

Even if, in theory, packages are “the same,” in reality, they are not, and the difference is called “transitive dependencies of transitive dependencies.” The simplest example of this use case would be:

this will be represented as the following folder structure:

/node_modules
  /editor
    /node_modules
      /button-1.3.0
      /icon-1.0.0 // on the same level as button above
  /modal-dialog
    /node_modules
      /button-1.3.0
        /node_modules
          /icons-1.0.0 // nested within button since on the lvl above there is another icon
      /icon-2.0.0
  /button-2.5.0

and final requests to icon@1.x will be:

/project/node_modules/editor/node_modules/icon-1.0.0
/project/node_modules/modal_dialog/node_modules/button-1.3.0/node_modules/icon-1.0.0

Considering that button@1.x is a duplicate, we need to replace the button in modal_dialog with the button from editor. And just “naive” startsWith will replace. /project/node_modules/modal_dialog/node_modules/button-1.3.0/ with /project/node_modules/editor/node_modules/button-1.3.0/ and the path to the last icon will be transformed into /project/node_modules/editor/node_modules/button-1.3.0/node_modules/icon-1.0.0, but there is no icon at this path.

So, what was the final reason for the non-deterministic behavior? It is actually a combination of:

Bringing it together

Now that it’s solved, and the plugin was battle-tested in Jira, everyone else can use it to shrink their bundles a bit. In Jira, it gave us approximately 10% of the overall bundle size reduction and about 300ms TTI improvement in the Issue View. The plugin is available here: https://github.com/atlassian-labs/webpack-deduplication-plugin – Connect to preview

All you need to do is add it to your Webpack plugins, and it will JustWork™:

const { WebpackDeduplicationPlugin } = require('webpack-deduplication-plugin');

// somewhere in the webpack settings, in plugins areas:
{
  ...
  plugins: [
    // cacheDir and rootPath - absolute paths to a cache directory
    // and to the root of your project
    new WebpackDeduplicationPlugin({ cacheDir, rootPath })
  ]
}

Final thoughts

These three stories are just a few chapters from the Jira front-end performance journey. They illustrate some of the approaches we take in Jira to find these opportunities. We have seen some success from these changes, but we also know there is more work to be done, and we’ll continue to work across all parts of our tech stack to improve the performance of Jira Cloud.

Exit mobile version