Automated Version Control


  • Version control is like an unlimited ‘undo’.
  • Version control also allows many people to work in parallel.

Setting Up Git


  • Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.

Creating a Repository


  • git init initializes a repository.
  • Git stores all of its repository data in the .git directory.

Tracking Changes


  • git status shows the status of a repository.
  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
  • git add puts files in the staging area.
  • git commit saves the staged content as a new commit in the local repository.
  • Write a commit message that accurately describes your changes.

Exploring History


  • git diff displays differences between commits.
  • git checkout recovers old versions of files.

Ignoring Things


  • The .gitignore file tells Git what files to ignore.

Remotes in GitLab


  • A local Git repository can be connected to one or more remote repositories.
  • Use the HTTPS protocol to connect to remote repositories.
  • git push copies changes from a local repository to a remote repository.
  • git pull copies changes from a remote repository to a local repository.

Branching and Merging


  • A branching workflow enables you to keep your main repository clean and allows for mistakes, fixes, and reviews before content is merged into main.

Collaborating


  • git clone copies a remote repository to create a local repository with a remote called origin automatically set up.
  • Branches are an important part of collaborating with others in Git repositories.
  • Ensure that you establish a collaborative workflow for your project team to use.

Conflicts


  • Conflicts occur when two or more people change the same lines of the same file.
  • The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.

Open Science


  • Open scientific work is more useful and more highly cited than closed
  • Publishing code is a critical part of making science reproducible
  • If your code is good enough to produce scientific results, then it is good enough to publish

Policy


  • Software may be publicly accessible as an open source software project and/or as an official USGS software information product.

  • While both a project and product may be public, only the official USGS software information product is citable by other publications.

  • Governing policies cascade from Federal to local levels. Check with your supervisor to ensure compliance with all local policies.

Licensing


  • A LICENSE file is often used in a repository to indicate how the contents of the repo may be used by others.
  • USGS software products require a LICENSE.md file in the project root of your repository.
  • Non-derivative USGS software products can use the CC0 1.0 license.
  • If you need a different license, consult the solicitor’s office to determine the appropriate license.

Citation


  • Create a DOI for your software information product.
  • Add a suggested citation to your repository.

Commonly Included Files


  • USGS software products typically contain “boilerplate” files.
  • Some of these files, like the DISCLAIMER.md, are mandatory and must be included in all USGS software products. Others are optional.
  • Examples of these files may be found in existing projects, or on this page.

Creating Metadata


  • A code.json file is a file formatted in JavaScript Object Notation (JSON) and contains project metadata. The code.json file is saved at the top-level of the project.
  • USGS compiles all of the code.json files for public products in GitLab into an inventory that is required by Federal policy.
  • You can use the code.json file template above to begin creating your project and product metadata with the required fields.

Software Review for Authors


  • A release candidate branch is named with the version number for the anticipated software information product release
  • Once you have a release candidate branch, update the DISCLAIMER.md
  • A GitLab review issue provides structure for reviewers and makes it easier for them to conduct a review
  • A PDF of the final review issue can serve as the review and reconciliation documentation in the USGS Information Product Data System

Software Review for Reviewers


  • An administrative review is required for all open-source software projects
  • A technical code review and a scientific domain review are required for official USGS software products
  • There are many ways to conduct and document a software review. One way is by creating a GitLab issue with comments documenting the review

Publishing


  • Submit an new issue as the first step in publishing a software product
  • Create an static Git tag and associated release
  • Activate your DOI using the url of the Git release

Continuing Your Project


  • A good workflow can streamline open-source project development while ensuring compliance with governing policies
  • While specific criteria necessitate releasing subsequent versions, this may also be done at the author’s discretion
  • Subsequent versions are released in a manner very similar to the initial version
  • The code.json file should be updated to include another object within the array that describes the new version.