Creating Metadata

Last updated on 2025-05-22 | Edit this page

Estimated time: 60 minutes

Overview

Questions

  • What is a code.json file?
  • How do you create a code.json file?
  • What are the required fields in the code.json file for a USGS software project?

Objectives

  • Explain what a code.json file is and how it is used.
  • Create a code.json file with the minimum required fields for a USGS software project and software information products.
  • Validate a code.json file.
  • Update a code.json file for a new version of the software.

Introduction


Metadata are descriptive elements in a standardized format that are necessary for identification, discovery, access, and use of information products such as software and data. Metadata answer fundamental questions such as who, what, when, where, why, and how.

Metadata for a software project are stored and maintained in a file called code.json located at the top-level of the project repository in GitLab. This code.json file is in JavaScript Object Notation (JSON) format. The code.json file provides basic information about the project and official software information products and will be aggregated with the information from other Department of the Interior software projects to form the Departmental Enterprise Code Inventory, which is required by the Federal Source Code Policy. The code.json file is required for software information products but may be created for projects without official software information products.

JSON Overview


The JSON data format allows machine-to-machine communication with structured text. JSON is language agnostic.

When discussing the JSON Syntax, verbally note what data types need quotes and which do not.

Demonstrate the JSON Syntax in a text editor such as Visual Studio Code or RStudio. Point out that the editor recognizes the .json extension and provides syntax highlighting and indentation assistance.

JSON Syntax:

  • Use key/value pairs

    • keys are strings, indicated by double quotes
    • values can be:
      • strings ("Vlad Dracula"),
      • numbers (1.5),
      • objects ({"key": "value", "key2": "value2"}),
      • arrays ([lists]),
      • boolean (true / false), or
      • null
    • separate keys from values with a colon
      • Format: "key": "value"
      • Example: "name": "Vlad Dracula"
  • Separate key/value pairs with commas:

    JSON

    {
        "name": "Vlad Dracula",
        "organization": "U.S. Geological Survey"
    }

Callout

Note: Many other languages (e.g., Python) allow trailing commas; however, trailing commons are considered an error for JSON syntax. For example, the following would give you an error:

JSON

{
  "name": "Vlad Dracula",
  "organization": "U.S. Geological Survey",
}

Generally, a JSON file will contain an object or an array. If it is an object, it will start and end with curly brackets {}. If it is an array, it will start and end with square brackets [].

Let us create a JSON file in our GitLab project space with the filename hello-world.json.

Screenshot showing a red circle around where to click to create a new file in a GitLab repository
Screenshot of adding a new file to a GitLab repository

In the web browser, add the following content to hello-world.json:

JSON

{
    "greeting": "hello-world"
}

Notice that in our example, our JSON represents an object since it starts with curly brackets. Also, notice that the GitLab web editor provides some highlighting and indentation assistance similar to what a desktop editor might provide.

You can use a JSON Validator like JSON Formatter & Validator to format and check your JSON.

Let us try adding a trailing comma in our JSON and validating it:

JSON

{
    "greeting": "hello-world",
}

The JSON Formatter & Validator will tell you what it found wrong and attempt to fix it for you:

OUTPUT

Info: Removed trailing comma.

Metadata Template


USGS provides a code.json template (see below) to help you get started writing project metadata. Notice that its top-level element is an array, which is designated by the square brackets.

JSON

[
  {
    "name": "REPOSITORY_NAME",
    "organization": "U.S. Geological Survey",
    "description": "REPOSITORY_DESCRIPTION",
    "version": "RELEASE_VERSION",
    "status": "RELEASE_STATUS",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME",
    "downloadURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/archive/RELEASE_VERSION/REPOSITORY_NAME-RELEASE_VERSION.zip",
    "disclaimerURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME.git",
    "vcs": "git",

    "laborHours": 0,

    "tags": [
      "TOPIC_TAG_1",
      "TOPIC_TAG_2"
    ],

    "languages": [
      "PROGRAMMING_LANG_1",
      "PROGRAMMING_LANG_2"
    ],

    "contact": {
      "name": "REPOSITORY_ADMINISTRATOR_NAME",
      "email": "REPOSITORY_ADMINISTRATOR_EMAIL"
    },

    "date": {
      "metadataLastUpdated": "YYYY-MM-DD"
    }
  }
]

Mention that you are going to use the GitLab IDE to create and edit a code.json file. Learners may use whatever editor they are comfortable with; however, you will not provide assistance with custom editors.

Create a Metadata File in GitLab

Create a new code.json file at the top level of your GitLab repository:

Screenshot showing a red circle around where to click to create a new file in a GitLab repository
Screenshot of adding a new file to a GitLab repository

Paste the template JSON into the file, add a commit message, and click Commit changes:

Screenshot of writing a commit message for adding the code.json file to a GitLab repository
Screenshot of adding a code.json template to a GitLab repository

Check in with learners to make sure they were able to create the new file and add the template text.

Add Project-Specific Information

Now, edit the code.json file to include project-specific information. While viewing the code.json file in GitLab, click Edit and Edit single file:

Screenshot of clicking Edit on the code.json file in the web browser
Screenshot of editing a file in GitLab

Replace the ALL_CAPS placeholders with meaningful values for the project. For the purposes of this exercise, the project includes code for modeling the co-occurrence of Vampires and Werewolves on Mars. The project team is actively developing the code. Eventually, they will release a USGS software information product in the public domain. This particular metadata object will document the entire project as opposed to a single product, so use “main” as the version. The project uses machine learning / artificial intelligence techniques and the code is written in Python.

GROUP_HIERARCHY is the group name under which your project is nested in GitLab. The GROUP_HIERARCHY may be one level if you are working out of a personal space (e.g., vdracular) or it may be a nested hierarchy (e.g., ecosystems/FRESC).

Below are the field definitions for code.json and examples of how the template can be updated:

  • name: Should be a short, human readable name for the project. This should match the value provided when creating the project in GitLab. The best practice is to use lowercase words with hyphens separating them.

JSON

"name": "vampires-and-werewolves"
  • organization: Must always be "U.S. Geological Survey"; casing and punctuation are important. No updates are needed to the template.

JSON

"organization": "U.S. Geological Survey"
  • description: This may be a longer description of the project. It should be no more than 1-2 sentences. Verbose descriptions may exist in the README.md file.

JSON

"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars."
  • version: This should be a semantic version number for the product (e.g., 1.0.0) or the DEFAULT_BRANCH name (e.g., main or master) depending on whether the metadata object is referencing the project or an information product. The version number should not include a leading v (i.e., v1.0.0) or other identifier. A Git branch (release candidate branch) must exist with the same name (e.g., 1.0.0) during the review process. Upon publication, the version branch is converted to a tag. (We will discuss more about release tags in a future episode).

JSON

"version": "main"
  • status: Must be one of the enumerated values listed below. There are no official definitions for these terms in code.gov; however, Wikipedia provides some good definitions, which are paraphrased below.
    • Ideation: planning phase of a software project.
    • Development: work on software project prior to formal testing.
    • Alpha: initial testing phase, often done within the project team or organization.
    • Beta: feature complete testing phase that follows Alpha testing, often available to users outside project team or organization.
    • Release Candidate: a Beta version with the potential to be ready for production. In USGS, a release candidate would be going through formal review and approval.
    • Production: the product has passed all stages of testing. In USGS, a production release has been reviewed and approved.
    • Archival: a version of the software that is no longer supported.

JSON

"status": "Development"
  • permissions
    • usageType: A list of enumerated values which describes the usage permissions for the release:
      1. openSource: Open source
      2. governmentWideReuse: Government-wide reuse
      3. exemptByLaw: The sharing of the source code is restricted by law or regulation, including—but not limited to—patent or intellectual property law, the Export Asset Regulations, the International Traffic in Arms Regulation, and the Federal laws and regulations governing classified information
      4. exemptByNationalSecurity: The sharing of the source code would create an identifiable risk to the detriment of national security, confidentiality of Government information, or individual privacy
      5. exemptByAgencySystem: The sharing of the source code would create an identifiable risk to the stability, security, or integrity of the agency’s systems or personnel
      6. exemptByAgencyMission: The sharing of the source code would create an identifiable risk to agency mission, programs, or operations
      7. exemptByCIO: The CIO believes it is in the national interest to exempt sharing the source code
      8. exemptByPolicyDate: The release was created prior to the M-16-21 policy (August 8, 2016)
    • license
      • name: The name of the license under which the product is released (e.g., Public Domain, CC0-1.0). In most cases, the appropriate license for USGS products is Public Domain, CC0-1.0, but sometimes (e.g., when some of the code is from outside sources or collaborators) different licenses are required. For more information on selecting an appropriate license see the Licensing episode in this Lesson.
      • URL: A link to the LICENSE.md file stored in this project
        • Must reference the main or master branch (this will differ for an official product, which should point to the immutable tagged version)
        • Must use the raw variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text. To get the raw variant of a file, click into the file, and click the Open raw button next to the Download button: Screenshot of red circle around a button that will Open Raw version of a file

JSON

"permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    }
  • homepageURL*: A link to the project homepage
    • May point to the project on GitLab, but will not include the .git extension
    • May point to a project home page elsewhere as long as it is publicly accessible (or soon-to-be publicly accessible, once you have gone through the release process) and in an approved location (e.g., usgs.gov webpage as opposed to a personal website)

JSON

"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves"
  • downloadURL: A link to download a ZIP archive of the project source code
    • Must point to the main or master branch (this will differ for an official product, which should point to the immutable tagged version)
    • In GitLab, you can get the download URL by selecting Code–> right click zip (under Download source code) –> Copy Link: Screenshot of red circle showing where to click Code, zip, and Copy link buttons to get the download url

JSON

"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip"
  • disclaimerURL: A link to the DISCLAIMER.md file stored in this project
    • Must use the raw variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text
    • Must point to the main or master branch (this will differ for an official product, which should point to the immutable tagged version)

JSON

"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md"
  • repositoryURL*: A link to this project on GitLab
    • Must include the .git extension

*Note: homepageURL and repositoryURL are different. repositoryURL should end with .git whereas the homepageURL should not.

JSON

"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git"
  • vcs: A lowercase string with the name of the version control system that is being used. For USGS, this will be git. No updates are needed to the template.

JSON

"vcs": "git"
  • laborHours: An estimate of total labor hours spent by your organization across the current version and all previous versions, including labor performed by federal employees and contractors. Labor hours are cumulative across all versions. Your best guess is fine. If not known, the recommendation is to use -1.

JSON

"laborHours": 0
  • tags: An array of topical/domain tags relevant to the project
    • Consider using the USGS Thesaurus or other controlled vocabularies to improve browse functionality in the code inventory.
    • These tags can be used to help people narrow down searches for software, so consider terms that will help direct potential users to your project
    • If the project supports AI/ML research and development, this array must include the tag usg-artificial-intelligence. This tag is short for U.S. Government Artificial Intelligence (i.e., do not use “usgs-artificial-intelligence”).

JSON

"tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ]
  • languages: An array of the programming languages used within this project (e.g., “Python”, “R”, “C++”). There is not a controlled vocabulary, so use your best judgement on how to represent the programming languages in your project.

JSON

"languages": [
      "Python"
    ]
  • contact: Point of contact information for the software information product.

JSON

"contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    }
  • date
    • metadataLastUpdated: An ISO datestamp (YYYY-MM-DD) of when the metadata item within the code.json file was last modified. Be sure to update this value whenever you modify any of the other key/value pairs for this metadata item. Note that you must use two digits for month and day (e.g., 2024-8-9 is not correct).

JSON

"date": {
      "metadataLastUpdated": "2024-05-29"
    }

Personal Space in GitLab

In the examples above, the URLs that we are generating reference Vlad Dracula’s or your own personal GitLab space. In reality, you cannot make a repository public that is located under a personal username. Instead, public repositories need to be located under a public group. The current recommendation is to have groups at the USGS Mission Area level (e.g., Ecosystems) and then subgroups at the USGS Science Center level. Project repositories will then be located within the Science Center subgroup. To avoid needing to rename all of your URLs, it is a best practice to start projects within these public groups and maintain more restrictive permissions at the project level.

This is what the full code.json file should look like after making the updates above:

JSON

[
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "main",
    "status": "Development",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 0,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-05-29"
    }
  }
]

Challenge

Use JSON Formatter & Validator to format and check your JSON. What errors were present in your JSON? Note that this tool only validates against the JSON syntax and does not validate against the code.gov metadata schema.

Additional code.json fields

Additional fields are also available. See the official code.gov metadata schema for additional details. Note that you should only add fields from the “releases” array within this schema. The full code.gov metadata schema includes other fields that are necessary for building the Enterprise Code Inventory, but those should not be included in the individual project code.json files. Fields that are not documented in the official code.gov metadata schema cannot be included in the code.json files.

Updating Metadata for Initial Software Information Product


Remember that the top-level element in code.json file is an array. This means it may contain more than one object for your project. The recommended practice is to order metadata objects with the DEFAULT_BRANCH (e.g., main) appearing first, followed by the most recently released version. For an initial software information product release, it would look something like this:

JS

[
 {
 // ... main, status Development
 },
 {
 // ... release 1.0.0, status Production
 }
]

Metadata evolve over time. There is some confusion where people think the metadata in the main branch should be for the main branch code only and not for any other branches. The reality is the metadata in the DEFAULT_BRANCH (e.g., main) should contain metadata for each version of the project (official or otherwise). The metadata in the tags associated with a specific version should contain metadata for the current version and all preceding versions; in this way, it will match the metadata in the main branch at the time the version is created.

Releasing an Initial Software Information Product

You are ready to release an initial version of your software information product. In the code.json file, copy the text for the main branch’s release object and paste it directly below in the code.json array (you will use the main branch release object as a type of template to make further changes). You will need to add a comma between the two objects after the closing } for the first object. In the second object, update the status field to Production. Additionally, update the URL fields in the second object to use 1.0.0 (or whatever version number you are using; it is not required to use 1.0.0) instead of main in the RELEASE_VERSION section of the URL. You will also need to update the laborHours and the metadataLastUpdated fields.

JSON

[
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "main",
    "status": "Development",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 200,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-06-15"
    }
  },
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "1.0.0",
    "status": "Production",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/1.0.0/vampires-and-werewolves-1.0.0.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 200,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-06-15"
    }
  }
]

The version of the code.json file that was created in the exercise above will be included in the 1.0.0 branch, once the branch is created, and ultimately the immutable tagged product, as well as in the main branch.

Note about Status Field

There are no set rules for what status needs to be assigned to a given version or branch of a project. The goal is to do the best to communicate to users how thoroughly particular code has been tested, reviewed, and approved, and how you might anticipate them using the project and products. For example, if you have testing, reviews, and approvals built into your development process such that the main branch is always the latest and greatest and should be the go-to code to use, then the main branch might be labeled with a status of ‘Production’. If instead the content in the main branch is not formally approved until a release branch is created, then the main branch might maintain a status of ‘Development’ to encourage users to use the most recent formal version.

Likewise, if a previous version of a product is still relevant and usable, it may continue to have a status label of ‘Production’. If, however, the newer version corrects some bugs and should be used instead of a previous version, then, the previous version should have its status updated to ‘Archival’.

Key Points

  • A code.json file is a file formatted in JavaScript Object Notation (JSON) and contains project metadata. The code.json file is saved at the top-level of the project.
  • USGS compiles all of the code.json files for public products in GitLab into an inventory that is required by Federal policy.
  • You can use the code.json file template above to begin creating your project and product metadata with the required fields.