Introduction to SBOM management on embedded Linux


In this article, we will learn the basic concepts of SBOM (Software Bill of Materials) and how it can be generated and managed on embedded Linux with the Yocto Project.

SBOM might not be a term everyone is aware of, but it is becoming a very important part of any product that has software on it, including embedded devices. Governments and market regulations already started requesting it, and might soon be mandatory on IoT and safety-critical devices.

But why it’s becoming so important? Let’s start first with some concepts…

What is SBOM?

BOM (Bill of Materials) is a familiar terminology used in engineering to describe the list of materials required to build a particular piece of hardware. SBOM stands for Software Bill of Materials and just extends this concept to the software that is shipped with the hardware.

When you buy a product from the market like a snack or a candy bar, you might want to know the ingredients that make up that product. How much sugar does it have? Does it have milk or gluten? This kind of information might guide you on some decisions like if you should consume the product and the consequences to your body.

Well, SBOM is the same thing! So when you consume a product that has software on it, you might want to know the ingredients that make up that software, so you can take decisions on if you should use it, how updated it is, if it has any vulnerabilities, and so on.

An SBOM has all the information required to describe the components in a software project, including the source code, licenses, dependencies, changes applied on top of the upstream project, fixes for known vulnerabilities, etc.

SBOM - Software Bill of Materials

But why this is important for the software industry in general, and for embedded devices in particular?

Why do we need SBOM?

When I started my career in the 90s, it was very common to write the entire firmware for an embedded system from scratch. Nowadays, a substantial part of our work is software integration!

Take for example the firmware of a modern microcontroller-based project. When working on a new project, you will start the project by integrating different software stacks from different providers, including the silicon vendor SDK (drivers, HALs, APIs, etc), protocol stacks (USB, Ethernet, etc), a real-time operating system (FreeRTOS, Zephyr, QNX, etc), and many other open-source and/or commercial libraries.

On embedded Linux, you should probably multiply this by 100x. The amount of open-source software integrated into a modern embedded Linux system is huge! A distro built with the Yocto Project might involve several hundreds of software packages!

So developing, deploying and managing complex software stacks like a Linux distribution is not simple at all. We need a very good strategy to manage the Software Supply Chain, and SBOM is part of this process.

Let’s say someone found a critical vulnerability in a specific version of OpenSSL. How easily can you identify if any of your products are impacted so you can quickly take some action (e.g. update the software)? In the end, it’s not only your software that matters. An embedded Linux system is built on top of a complex software supply chain (libraries, bootloaders, kernels, compilers, development tools, etc), and everything matters!

Attacks on the software supply chain are growing year after year. There are different attack surfaces and so many vulnerabilities discovered out there (log4j, ripple20, foreshadow, heartbleed, meltdown, stagefright, etc). How do you know if your product is affected?

That is why we need SBOM. And that is why governments and market sectors are pushing for legislation and guidelines on SBOM management.

For example, the US Government issued an executive order for having SBOM on all software acquired by the Federal Government, the healthcare industry is increasing adoption to SBOM, and SBOM is part of the ENISA Guidelines for securing the Internet of Things.

Now that we have a clear idea of why we need SBOM to better manage the lifecycle of a software project, let’s see how an SBOM looks like…

How an SBOM looks like?

SBOM is a formal method to describe all components (libraries, executables, modules, etc) and their relationship when building a piece of software. These software components can be open source or proprietary, and the data about them can be widely available or somewhat restricted. But SBOM is not the format itself, it is just the process of creating this inventory or catalog of software components.

Two of the most popular standards for SBOM management are CycloneDX and SPDX.

CycloneDX was designed in 2017 for use with OWASP Dependency-Track, an open-source analysis platform that identifies risk in the software supply chain, created by the OWASP Foundation.

SPDX is an open-source project hosted by the Linux Foundation and supported by the Yocto Project. And since our focus here is embedded Linux and the Yocto Project, let’s have a look at the SPDX standard.

The SPDX standard

SPDX (Software Package Data Exchange) is an open standard for communicating software bill of material information, including provenance, license, security, and other related information.

It started in 2010 in a Linux Foundation workgroup, originally created for license compliance, but later focusing on SBOM generation. The first version of the specification (SPDX 1.0) was released in 2011 and ten years later, SPDX was published as an ISO standard (ISO/IEC 5962:2021).

The specification is freely available on the project’s website and the sources can be cloned from its Git repository.

SPDX makes it possible to describe all components of a software project in various human-readable formats, including YAML, JSON, RDF/XML, and even .xls spreadsheets.

The best way to understand SPDX files is to generate them! So let’s play with the Yocto Project now…

Producing SPDX files

Generating an accurate SBOM is not an easy task, and there are different ways to do it.

For example, one could take the firmware of an embedded device, and via static analysis and reverse engineering tools, try to identify its software components. But some pieces of information might be difficult to collect with this technique, like provenance, dependencies, build tools, and changes applied on top of upstream software.

So the best way to do this analysis is during the build, via a build system tool like Open Embedded/Yocto Project. A build system is more authoritative because it’s building everything from sources, and much more accurate since no guessing or heuristics are necessary to identify most information.

And the Yocto Project is really a pioneer in this area, supporting the generation of SPDX files for several years now!

To enable SPDX generation in the Yocto Project we just need to inherit the create-spdx class in a configuration file:

INHERIT += "create-spdx"

A few extra variables are available to customize the behavior of the create-spdx class:

  • SPDX_INCLUDE_SOURCES: add a description of the source files used to build the packages to the SPDX files.
  • SPDX_ARCHIVE_SOURCES: create compressed archives of the sources for packages installed on the target.
  • SPDX_ARCHIVE_PACKAGED: create compressed archives of the files in the generated target packages.
  • SPDX_PRETTY: make the output more human-readable (indentation, newlines, etc).
  • SPDX_ORG: name and contact of the organization that is generating the SPDX files.

After building the image, the SPDX files will be available in a JSON format inside tmp/deploy/images/MACHINE/. Depending on the options enabled, extra SPDX files might be generated in tmp/deploy/spdx/MACHINE/.

Example of SPDX files generated for a core-image-minimal build for qemuarm:

$ ls -1 tmp/deploy/images/qemuarm/core-image-minimal-qemuarm.spdx*
tmp/deploy/images/qemuarm/core-image-minimal-qemuarm.spdx.index.json
tmp/deploy/images/qemuarm/core-image-minimal-qemuarm.spdx.json
tmp/deploy/images/qemuarm/core-image-minimal-qemuarm.spdx.tar.zst

The core-image-minimal-qemuarm.spdx.json file is a top-level file that contains a reference to all software packages installed in the image.

The core-image-minimal-qemuarm.spdx.tar.zst file is a compressed archive containing individual SPDX documents for the software components installed in the image.

The core-image-minimal-qemuarm.spdx.index.json is an index file that lists all of the SPDX JSON files in the compressed archive.

Each SPDX file in the archive describes all information about a software component, including version, license, files installed, packages generated, dependencies, CVEs fixed, etc.

For example, this is the content of recipe-netbase.spdx.json:

{
  "SPDXID": "SPDXRef-DOCUMENT",
  "creationInfo": {
    "comment": "This document was created by analyzing recipe files during the build.",
    "created": "2023-02-20T21:00:59Z",
    "creators": [
      "Tool: OpenEmbedded Core create-spdx.bbclass",
      "Organization: Embedded Labworks (support@e-labworks.com)",
      "Person: N/A ()"
    ],
    "licenseListVersion": "3.14"
  },
  "dataLicense": "CC0-1.0",
  "documentNamespace": "http://spdx.org/spdxdoc/recipe-netbase-7b385d09-d0bb-55f0-80b4-172a1e9035fc",
  "name": "recipe-netbase",
  "packages": [
    {
      "SPDXID": "SPDXRef-Recipe-netbase",
      "copyrightText": "NOASSERTION",
      "description": "This package provides the necessary infrastructure for basic TCP/IP based networking",
      "downloadLocation": "http://ftp.debian.org/debian/pool/main/n/netbase/netbase_6.3.tar.xz",
      "externalRefs": [
        {
          "referenceCategory": "SECURITY",
          "referenceLocator": "cpe:2.3:a:*:netbase:6.3:*:*:*:*:*:*:*",
          "referenceType": "http://spdx.org/rdf/references/cpe23Type"
        }
      ],
      "homepage": "http://packages.debian.org/netbase",
      "licenseConcluded": "NOASSERTION",
      "licenseDeclared": "GPL-2.0-only",
      "licenseInfoFromFiles": [
        "NOASSERTION"
      ],
      "name": "netbase",
      "summary": "Basic TCP/IP networking support",
      "supplier": "Organization: Embedded Labworks (support@e-labworks.com)",
      "versionInfo": "6.3"
    }
  ],
  "relationships": [
    {
      "relatedSpdxElement": "SPDXRef-Recipe-netbase",
      "relationshipType": "DESCRIBES",
      "spdxElementId": "SPDXRef-DOCUMENT"
    }
  ],
  "spdxVersion": "SPDX-2.2"
}

More information about SBOM management and SPDX generation with the Yocto Project can be found on the project’s website.

After browsing and inspecting the SPDX files, you will realize that a lot of data is generated. And one challenge we have today is how to consume all of this!

How to consume SPDX files?

There are several tools to consume SPDX files listed on the SPDX website.

One of the most popular tools is called Fossology, an open-source license compliance software system and toolkit that supports different SBOM formats and standards, including SPDX files.

But, at the time of writing this article, I feel we lack a nice and easy-to-use open-source tool to consume (visualize, search, analyze, etc) SPDX files.

I find out Fossology to be too much complex to set up and use, and I’m particularly interested in an open-source tool to consume the SPDX files generated by the Yocto Project. So if you know about one, please let me know! :-)

Anyway, as the need for better control of the software supply chain increases, it’s a matter of time for people, companies and the open-source community to start creating new tools (or improving existing ones) to make it easier to consume SBOM data.

About the author: Sergio Prado has been working with embedded systems for more than 25 years. If you want to know more about his work, please visit the About Me page or Embedded Labworks website.

Please email your comments or questions to hello at sergioprado.blog, or sign up the newsletter to receive updates.


See also