Content from Automated Version Control


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What is version control and why should I use it?

Objectives

  • Explain the benefits of an automated version control system.
  • Explain the basics of how automated version control systems work.

We’ll start by exploring how version control can be used to keep track of what one person did and when. Even if you aren’t collaborating with other people, automated version control is much better than trying to figure out which of the following is your most recent version:

  • GrantReport_Final.docx
  • GrantReport_Final-SupervisoryReview.docx
  • GrantReport_ReviewWithChanges.docx
  • GrantReport_Finalv3.docx
  • GrantReport_Final_for_Review.docx

We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes, Google Docs’ version history, or LibreOffice’s Recording and Displaying Changes.

Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.

Changes Are Saved Sequentially

Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.

Different Versions Can be Saved

Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.

Multiple Versions Can be Merged

A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.

The Long History of Version Control Systems

Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities. More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

Paper Writing

  • Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve the excellent version of your conclusion? Is it even possible?

  • Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper? If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the Track Changes option? Do you have a history of those changes?

  • Recovering the excellent version is only possible if you created a copy of the old version of the paper.

  • Collaborative writing with traditional word processors is cumbersome. Either every collaborator has to work on a document sequentially (slowing down the process of writing), or you have to send out a version to all collaborators and manually merge their comments into your document. The ‘track changes’ or ‘record changes’ option can highlight changes for you and simplifies merging, but as soon as you accept changes you will lose their history. You will then no longer know who suggested that change, why it was suggested, or when it was merged into the rest of the document. Even online word processors like Google Docs or Microsoft Office Online do not fully resolve these problems.

Key Points

  • Version control is like an unlimited ‘undo’.
  • Version control also allows many people to work in parallel.

Content from Setting Up Git


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I get set up to use Git?

Objectives

  • Configure Git the first time it is used on a computer.
  • Explain the meaning of the --global configuration flag.

When we use Git on a new computer for the first time, we need to configure a few things. Below are some configurations we will set as we get started with Git:

  • our name and email address,
  • what our preferred text editor is,
  • and that we want to use these settings globally (i.e. for every project).

On a command line, Git commands are written as git verb options, where verb is what we actually want to do and options is additional information which may be needed for the verb. So here is how Dracula sets up his new laptop:

BASH

$ git config --global user.name "Vlad Dracula"
$ git config --global user.email "vdracula@usgs.gov"

Please use your own name and email address instead of Dracula’s. This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to GitHub, BitBucket, GitLab or another Git host server after this lesson will include this information.

For this lesson, we will be interacting with GitLab and so the email address used should be your USGS email.

Line Endings

As with other keys, when you hit Enter or (or, on Macs Return), your computer encodes this input as a character (or two). Different operating systems use different character(s) to represent the end of a line. Windows uses the combination of the carriage return and linefeed characters and Unix and Mac use only linefeed. These can cause otherwise identical files to look different to Git. The solution is to automatically strip the carriage return characters when you move files from Windows to the other systems and add them back when you move files in the other direction. You can read more about this issue in the Pro Git book.

You can change the way Git recognizes and encodes line endings using the core.autocrlf command to git config. The following settings are recommended:

On macOS and Linux:

BASH

$ git config --global core.autocrlf input

And on Windows:

BASH

$ git config --global core.autocrlf true

When Git spots a conflict (discussed later), it will automatically open your editor so you can resolve the conflict. To set your favorite editor, choose one of the following configuration commands:

Editor Configuration command
Atom $ git config --global core.editor "atom --wait"
nano $ git config --global core.editor "nano -w"
BBEdit (Mac, with command line tools) $ git config --global core.editor "bbedit -w"
Sublime Text (Mac) $ git config --global core.editor "/Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl -n -w"
Sublime Text (Win, 32-bit install) $ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w"
Sublime Text (Win, 64-bit install) $ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"
Notepad (Win) $ git config --global core.editor "c:/Windows/System32/notepad.exe"
Notepad++ (Win, 32-bit install) $ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
Notepad++ (Win, 64-bit install) $ git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"
Kate (Linux) $ git config --global core.editor "kate"
Gedit (Linux) $ git config --global core.editor "gedit --wait --new-window"
Scratch (Linux) $ git config --global core.editor "scratch-text-editor"
Emacs $ git config --global core.editor "emacs"
Vim $ git config --global core.editor "vim"
VS Code $ git config --global core.editor "code --wait"

It is possible to reconfigure the text editor for Git whenever you want to change it.

Exiting Vim

Note that Vim is the default editor for many programs. If you haven’t used Vim before and wish to exit a session without saving your changes, press Esc then type :q! and hit Enter or or on Macs, Return. If you want to save your changes and quit, press Esc then type :wq and hit Enter or or on Macs, Return.

Git (2.28+) allows configuration of the name of the branch created when you initialize any new repository. Dracula decides to use that feature to set it to main so it matches the cloud service he will eventually use.

BASH

$ git config --global init.defaultBranch main

Default Git branch naming

Source file changes are associated with a “branch.” For new learners in this lesson, it’s enough to know that branches exist, and this lesson uses one branch.

By default, Git will create a branch called master when you create a new repository with git init (as explained in the next Episode). The software development community has moved to adopt the term main instead.

In 2020, most Git code hosting services transitioned to using main as the default branch. As an example, any new repository that is opened in GitHub or the USGS GitLab defaults to main. However, Git has not yet made the same change. As a result, local repositories must be manually configured to have the same default branch name as most cloud services.

The five commands we just ran above only need to be run once: the flag --global tells Git to use the settings for every project, in your user account, on this computer.

Let’s review those settings and test our core.editor right away:

BASH

$ git config --global --edit

Let’s close the file without making any additional changes. Remember, since typos in the config file will cause issues, it’s safer to view the configuration with:

BASH

$ git config --list

And if necessary, change your configuration using the same commands to choose another editor or update your email address. This can be done as many times as you want.

Proxy

Typically, your work in USGS will not require the use of a proxy. In the unusual case that your group requires it, you may also need to tell Git about the proxy:

BASH

$ git config --global http.proxy proxy-url
$ git config --global https.proxy proxy-url

To disable the proxy, use

BASH

$ git config --global --unset http.proxy
$ git config --global --unset https.proxy

Git Help and Manual

Always remember that if you forget the subcommands or options of a git command, you can access the relevant list of options typing git <command> -h or access the corresponding Git manual by typing git <command> --help, e.g.:

BASH

$ git config -h
$ git config --help

While viewing the manual, remember the : is a prompt waiting for commands and you can press Q to exit the manual.

More generally, you can get the list of available git commands and further resources of the Git manual typing:

BASH

$ git help

There are many development environments that have built-in integrations with Git to streamline the most common Git operations. This lesson does not go into details on using these integrations, but here are some resources that you can explore on your own: - RStudio: https://docs.posit.co/ide/user/ide/guide/tools/version-control.html - Visual Studio Code: https://code.visualstudio.com/docs/sourcecontrol/overview

Key Points

  • Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.

Content from Creating a Repository


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • Where does Git store information?

Objectives

  • Create a local Git repository.
  • Describe the purpose of the .git directory.

Once Git is configured, we can start using it.

We will continue with the story of Wolfman and Dracula who are modeling the co-occurrences of vampires and werewolves on Mars.

First, let us create a new directory in the Desktop folder for our work and then change the current working directory to the newly created one:

BASH

$ cd ~/Desktop
$ mkdir vampires-and-werewolves
$ cd vampires-and-werewolves

Then we tell Git to make vampires-and-werewolves a repository -- a place where Git can store versions of our files:

BASH

$ git init

It is important to note that git init will create a repository that can include subdirectories and their files—there is no need to create separate repositories nested within the vampires-and-werewolves repository, whether subdirectories are present from the beginning or added later. Also, note that the creation of the vampires-and-werewolves directory and its initialization as a repository are completely separate processes.

If we use ls to show the directory’s contents, it appears that nothing has changed:

BASH

$ ls

But if we add the -a flag to show everything, we can see that Git has created a hidden directory within vampires-and-werewolves called .git:

BASH

$ ls -a

OUTPUT

.	..	.git

Git uses this special subdirectory to store all the information about the project, including the tracked files and sub-directories located within the project’s directory. If we ever delete the .git subdirectory, we will lose the project’s history.

Next, we will change the default branch to be called main. This might be the default branch depending on your settings and version of git. See the setup episode for more information on this change.

BASH

$ git branch -m main

We can check that everything is set up correctly by asking Git to tell us the status of our project:

BASH

$ git status

OUTPUT

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

If you are using a different version of git, the exact wording of the output might be slightly different.

Places to Create Git Repositories

Along with tracking information about the vampires and werewolves modeling project on Mars (the project we have already created), Dracula would also like to track information about vampires and werewolves on various moons. Despite Wolfman’s concerns, Dracula creates a moons project inside his vampires-and-werewolves project with the following sequence of commands:

BASH

$ cd ~/Desktop   # return to Desktop directory
$ cd vampires-and-werewolves     # go into vampires-and-werewolves directory, which is already a Git repository
$ ls -a          # ensure the .git subdirectory is still present in the vampires-and-werewolves directory
$ mkdir moons    # make a subdirectory vampires-and-werewolves/moons
$ cd moons       # go into moons subdirectory
$ git init       # make the moons subdirectory a Git repository
$ ls -a          # ensure the .git subdirectory is present indicating we have created a new Git repository

Is the git init command, run inside the moons subdirectory, required for tracking files stored in the moons subdirectory?

No. Dracula does not need to make the moons subdirectory a Git repository because the vampires-and-werewolves repository can track any files, sub-directories, and subdirectory files under the vampires-and-werewolves directory. Thus, in order to track all information about moons, Dracula only needed to add the moons subdirectory to the vampires-and-werewolves directory.

Additionally, Git repositories can interfere with each other if they are “nested”: the outer repository will try to version-control the inner repository. Therefore, it is best to create each new Git repository in a separate directory. To be sure that there is no conflicting repository in the directory, check the output of git status. If it looks like the following, you are good to go to create a new repository as shown above:

BASH

$ git status

OUTPUT

fatal: Not a git repository (or any of the parent directories): .git

Correcting git init Mistakes

Wolfman explains to Dracula how a nested repository is redundant and may cause confusion down the road. Dracula would like to remove the nested repository. How can Dracula undo his last git init in the moons subdirectory?

Background

Removing files from a Git repository needs to be done with caution. But we have not learned yet how to tell Git to track a particular file; we will learn this in the next episode. Files that are not tracked by Git can easily be removed like any other “ordinary” files with

BASH

$ rm filename

Similarly a directory can be removed using rm -r dirname or rm -rf dirname. If the files or folder being removed in this fashion are tracked by Git, then their removal becomes another change that we will need to track, as we will see in the next episode.

Solution

Git keeps all of its files in the .git directory. To recover from this little mistake, Dracula can just remove the .git folder in the moons subdirectory by running the following command from inside the vampires-and-werewolves directory:

BASH

$ rm -rf moons/.git

But be careful! Running this command in the wrong directory will remove the entire Git history of a project you might want to keep. Therefore, always check your current directory using the command pwd.

Key Points

  • git init initializes a repository.
  • Git stores all of its repository data in the .git directory.

Content from Tracking Changes


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I record changes in Git?
  • How do I check the status of my version control repository?
  • How do I record notes about what changes I made and why?

Objectives

  • Go through the modify-add-commit cycle for one or more files.
  • Explain where information is stored at each stage of that cycle.
  • Distinguish between descriptive and non-descriptive commit messages.

First let us make sure we are still in the right directory. You should be in the vampires-and-werewolves directory.

BASH

$ cd ~/Desktop/vampires-and-werewolves

Let us create a file called mars.txt that contains some notes about the Red Planet’s suitability for vampires and werewolves. We will use nano to edit the file; you can use whatever editor you like. In particular, this does not have to be the core.editor you set globally earlier. But remember, the bash command to create or edit a new file will depend on the editor you choose (it might not be nano). For a refresher on text editors, check out “Which Editor?” in The Unix Shell lesson.

BASH

$ nano mars.txt

Type the text below into the mars.txt file:

OUTPUT

Cold, dry, and everything is red, vampires' favorite color

Let us first verify that the file was properly created by running the list command (ls):

BASH

$ ls

OUTPUT

mars.txt

mars.txt contains a single line, which we can see by running:

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color

If we check the status of our project again, Git tells us that it has noticed the new file:

BASH

$ git status

OUTPUT

On branch main

No commits yet

Untracked files:
   (use "git add <file>..." to include in what will be committed)

	mars.txt

nothing added to commit but untracked files present (use "git add" to track)

The “untracked files” message means that there is a file in the directory that Git is not keeping track of. We can tell Git to track a file using git add:

BASH

$ git add mars.txt

and then check that the right thing happened:

BASH

$ git status

OUTPUT

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   mars.txt

Git now knows that it is supposed to keep track of mars.txt, but it has not recorded these changes as a commit yet. To get it to do that, we need to run one more command:

BASH

$ git commit -m "Start notes on Mars suitability for vampires and werewolves"

OUTPUT

[main (root-commit) f22b25e] Start notes on Mars suitability for vampires and werewolves
 1 file changed, 1 insertion(+)
 create mode 100644 mars.txt

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit (or revision) and its short identifier is f22b25e. Your commit may have another identifier.

We use the -m flag (for “message”) to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch nano (or whatever other editor we configured as core.editor) so that we can write a longer message.

Good commit messages start with a brief (<50 characters) statement about the changes made in the commit. Generally, the message should complete the sentence “If applied, this commit will” . If you want to go into more detail, add a blank line between the summary line and your additional notes. Use this additional space to explain why you made changes and/or what their impact will be.

If we run git status now:

BASH

$ git status

OUTPUT

On branch main
nothing to commit, working tree clean

it tells us everything is up to date. If we want to know what we have done recently, we can ask Git to show us the project’s history using git log:

BASH

$ git log

OUTPUT

commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 09:51:46 2013 -0400

    Start notes on Mars suitability for vampires and werewolves

git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the git commit command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.

Where Are My Changes?

If we run ls at this point, we will still see just one file called mars.txt. That is because Git saves information about files’ history in the special .git directory mentioned earlier so that our filesystem does not become cluttered (and so that we cannot accidentally edit or delete an old version).

Now suppose Dracula adds more information to the file. (Again, we will edit with nano and then cat the file to show its contents; you may use a different editor, and do not need to cat.)

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves

When we run git status now, it tells us that a file it already knows about has been modified:

BASH

$ git status

OUTPUT

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")

The last line is the key phrase: “no changes added to commit”. We have changed this file, but we have not told Git we will want to save those changes (which we do with git add) nor have we saved them (which we do with git commit). So let us do that now. It is good practice to always review our changes before saving them. We do this using git diff. This shows us the differences between the current state of the file and the most recently saved version:

BASH

$ git diff

OUTPUT

diff --git a/mars.txt b/mars.txt
index df0654a..315bf3a 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,2 @@
 Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves

The output is cryptic because it is actually a series of commands for tools like editors and patch telling them how to reconstruct one file given the other. If we break it down into pieces:

  1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
  2. The second line tells exactly which versions of the file Git is comparing; df0654a and 315bf3a are unique computer-generated labels for those versions.
  3. The third and fourth lines once again show the name of the file being changed.
  4. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

After reviewing our change, it is time to commit it:

BASH

$ git commit -m "Add information about suitability of Mars for werewolves"

OUTPUT

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")

Whoops: Git will not commit because we did not use git add first. Let us fix that:

BASH

$ git add mars.txt
$ git commit -m "Add information about suitability of Mars for werewolves"

OUTPUT

[main 34961b1] Add information about suitability of Mars for werewolves
 1 file changed, 1 insertion(+)

Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we are adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we have not finished yet).

To allow for this, Git has a special staging area where it keeps track of things that have been added to the current changeset but not yet committed.

Staging Area

If you think of Git as taking snapshots of changes over the life of a project, git add specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot, and makes a permanent record of it (as a commit). If you do not have anything staged when you type git commit, Git will prompt you to use git commit -a or git commit --all, which is kind of like gathering everyone to take a group photo! However, it is almost always better to explicitly add things to the staging area, because you might commit changes you forgot you made. (Going back to the group photo simile, you might get an extra with incomplete makeup walking on the stage for the picture because you used -a!) Try to stage things manually, or you might find yourself searching for “git undo commit” more than you would like!

The Git Staging Area

Let us watch as our changes to a file move from our editor to the staging area and into long-term storage. First, we will add another line to the file:

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity

BASH

$ git diff

OUTPUT

diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
 Cold, dry, and everything is red, vampires' favorite color
 The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity

So far, so good: we have added one line to the end of the file (shown with a + in the first column). Now let us put that change in the staging area and see what git diff reports:

BASH

$ git add mars.txt
$ git diff

There is no output: as far as Git can tell, there is no difference between what it has been asked to save permanently and what is currently in the directory. However, if we do this:

BASH

$ git diff --staged

OUTPUT

diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
 Cold, dry, and everything is red, vampires' favorite color
 The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity

it shows us the difference between the last committed change and what is in the staging area. Let us save our changes:

BASH

$ git commit -m "Discuss suitability of Mars' climate for mummies"

OUTPUT

[main 005937f] Discuss suitability of Mars' climate for mummies
 1 file changed, 1 insertion(+)

check our status:

BASH

$ git status

OUTPUT

On branch main
nothing to commit, working tree clean

and look at the history of what we have done so far:

BASH

$ git log

OUTPUT

commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main)
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 10:14:07 2013 -0400

    Discuss suitability of Mars' climate for mummies

commit 34961b159c27df3b475cfe4415d94a6d1fcd064d
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 10:07:21 2013 -0400

    Add information about suitability of Mars for werewolves

commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 09:51:46 2013 -0400

    Start notes on Mars suitability for vampires and werewolves

Word-based diffing

Sometimes, e.g. in the case of the text documents a line-wise diff is too coarse. That is where the --color-words option of git diff comes in very useful as it highlights the changed words using colors.

Paging the Log

When the output of git log is too long to fit in your screen, git uses a program to split it into pages of the size of your screen. When this “pager” is called, you will notice that the last line in your screen is a :, instead of your usual prompt.

  • To get out of the pager, press Q.
  • To move to the next page, press Spacebar.
  • To search for some_word in all pages, press / and type some_word. Navigate through matches pressing n.

Limit Log Size

To avoid having git log cover your entire terminal screen, you can limit the number of commits that Git lists by using -N, where N is the number of commits that you want to view. For example, if you only want information from the last commit you can use:

BASH

$ git log -1

OUTPUT

commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main)
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 10:14:07 2013 -0400

  Discuss suitability of Mars' climate for mummies

You can also reduce the quantity of information using the --oneline option:

BASH

$ git log --oneline

OUTPUT

005937f (HEAD -> main) Discuss suitability of Mars' climate for mummies
34961b1 Add information about suitability of Mars for werewolves
f22b25e Start notes on Mars suitability for vampires and werewolves

You can also combine the --oneline option with others. One useful combination adds --graph to display the commit history as a text-based graph and to indicate which commits are associated with the current HEAD, the current branch main, or other Git references:

BASH

$ git log --oneline --graph

OUTPUT

* 005937f (HEAD -> main) Discuss suitability of Mars' climate for mummies
* 34961b1 Add information about suitability of Mars for werewolves
* f22b25e Start notes on Mars suitability for vampires and werewolves

Directories

Two important facts you should know about directories in Git.

  1. Git does not track directories on their own, only files within them. Try it for yourself:

BASH

$ mkdir spaceships
$ git status
$ git add spaceships
$ git status

Note, our newly created empty directory spaceships does not appear in the list of untracked files even if we explicitly add it (via git add) to our repository. This is the reason why you will sometimes see .gitkeep files in otherwise empty directories. Unlike .gitignore, these files are not special and their sole purpose is to populate a directory so that Git adds it to the repository. In fact, you can name such files anything you like.

  1. If you create a directory in your Git repository and populate it with files, you can add all files in the directory at once by:

BASH

git add <directory-with-files>

Try it for yourself:

BASH

$ touch spaceships/apollo-11 spaceships/sputnik-1
$ git status
$ git add spaceships
$ git status

Before moving on, we will commit these changes.

BASH

$ git commit -m "Add some initial thoughts on spaceships"

To recap, when we want to add changes to our repository, we first need to add the changed files to the staging area (git add) and then commit the staged changes to the repository (git commit):

The Git Commit Workflow

Choosing a Commit Message

Which of the following commit messages would be most appropriate for the last commit made to mars.txt?

  1. “Changes”
  2. “Added line ‘Mummies will appreciate the lack of humidity’ to mars.txt”
  3. “Discuss suitability of Mars’ climate for mummies”

Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative.

Committing Changes to Git

Which command(s) below would save the changes of myfile.txt to my local Git repository?

  1. BASH

       $ git commit -m "my recent changes"
  2. BASH

       $ git init myfile.txt
       $ git commit -m "my recent changes"
  3. BASH

       $ git add myfile.txt
       $ git commit -m "my recent changes"
  4. BASH

       $ git commit -m myfile.txt "my recent changes"
  1. Would only create a commit if files have already been staged.
  2. Would try to create a new repository.
  3. Is correct: first add the file to the staging area, then commit.
  4. Would try to commit a file “my recent changes” with the message myfile.txt.

Committing Multiple Files

The staging area can hold changes from any number of files that you want to commit as a single snapshot.

  1. Add some text to mars.txt noting your decision to consider adding mummies to your model
  2. Create a new file mummies.txt with your initial thoughts about including co-occurrences of mummies in your model
  3. Add changes from both files to the staging area, and commit those changes.

The output below from cat mars.txt reflects only content added during this exercise. Your output may vary.

First we make our changes to the mars.txt and mummies.txt files:

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Maybe we should also consider including mummies in our model.

BASH

$ nano mummies.txt
$ cat mummies.txt

OUTPUT

Mummies often co-occur with vampires and werewolves in stories. We should definitely include mummies in our co-occurrence model. 

Now you can add both files to the staging area. We can do that in one line:

BASH

$ git add mars.txt mummies.txt

Or with multiple commands:

BASH

$ git add mars.txt
$ git add mummies.txt

Now the files are ready to commit. You can check that using git status. If you are ready to commit use:

BASH

$ git commit -m "Write plans to add mummies to model"

OUTPUT

[main cc127c2]
 Write plans to add mummies to model
 2 files changed, 2 insertions(+)
 create mode 100644 mummies.txt

bio Repository

  • Create a new Git repository on your computer called bio.
  • Write a three-line biography for yourself in a file called me.txt, commit your changes
  • Modify one line, add a fourth line
  • Display the differences between its updated state and its original state.

If needed, move out of the vampires-and-werewolves folder:

BASH

$ cd ..

Create a new folder called bio and ‘move’ into it:

BASH

$ mkdir bio
$ cd bio

Initialize git:

BASH

$ git init

Create your biography file me.txt using nano or another text editor. Once in place, add and commit it to the repository:

BASH

$ git add me.txt
$ git commit -m "Add biography file" 

Modify the file as described (modify one line, add a fourth line). To display the differences between its updated state and its original state, use git diff:

BASH

$ git diff me.txt

Key Points

  • git status shows the status of a repository.
  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
  • git add puts files in the staging area.
  • git commit saves the staged content as a new commit in the local repository.
  • Write a commit message that accurately describes your changes.

Content from Exploring History


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How can I identify old versions of files?
  • How do I review my changes?
  • How can I recover old versions of files?

Objectives

  • Explain what the HEAD of a repository is and how to use it.
  • Identify and use Git commit numbers.
  • Compare various versions of tracked files.
  • Restore old versions of files.

As we saw in the previous episode, we can refer to commits by their identifiers. You can refer to the most recent commit of the working directory by using the identifier HEAD.

We have been adding one line at a time to mars.txt, so it is easy to track our progress by looking, so let us do that using our HEADs. Before we start, let us make a change to mars.txt, adding yet another line.

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?

Now, let us see what we get.

BASH

$ git diff HEAD mars.txt

OUTPUT

diff --git a/mars.txt b/mars.txt
index b36abfd..0848c8d 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,3 +1,4 @@
 Cold, dry, and everything is red, vampires' favorite color
 The two moons may be a problem for werewolves
 Mummies will appreciate the lack of humidity
+Why are we talking about mummies?

which is the same as what you would get if you leave out HEAD (try it). The real goodness in all this is when you can refer to previous commits. We do that by adding ~1 (where “~” is “tilde”, pronounced [til-duh]) to refer to the commit one before HEAD.

BASH

$ git diff HEAD~1 mars.txt

If we want to see the differences between older commits we can use git diff again, but with the notation HEAD~1, HEAD~2, and so on, to refer to them:

BASH

$ git diff HEAD~3 mars.txt

OUTPUT

diff --git a/mars.txt b/mars.txt
index df0654a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
 Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?

We could also use git show which shows us what changes we made at an older commit as well as the commit message, rather than the differences between a commit and our working directory that we see by using git diff.

BASH

$ git show HEAD~3 mars.txt

OUTPUT

commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date:   Thu Aug 22 09:51:46 2013 -0400

    Start notes on Mars suitability for vampires and werewolves

diff --git a/mars.txt b/mars.txt
new file mode 100644
index 0000000..df0654a
--- /dev/null
+++ b/mars.txt
@@ -0,0 +1 @@
+Cold, dry, and everything is red, vampires' favorite color

In this way, we can build up a chain of commits. The most recent end of the chain is referred to as HEAD; we can refer to previous commits using the ~ notation, so HEAD~1 means “the previous commit”, while HEAD~123 goes back 123 commits from where we are now.

We can also refer to commits using those long strings of digits and letters that git log displays. These are unique IDs for the changes, and “unique” really does mean unique: every change to any set of files on any computer has a unique 40-character identifier. Our first commit was given the ID f22b25e3233b4645dabd0d81e651fe074bd8e73b, so let us try this:

BASH

$ git diff f22b25e3233b4645dabd0d81e651fe074bd8e73b mars.txt

OUTPUT

diff --git a/mars.txt b/mars.txt
index df0654a..93a3e13 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
 Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?

That is the right answer, but typing out random 40-character strings is annoying, so Git lets us use just the first few characters (typically seven for normal size projects):

BASH

$ git diff f22b25e mars.txt

OUTPUT

diff --git a/mars.txt b/mars.txt
index df0654a..93a3e13 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
 Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?

All right! So we can save changes to files and see what we have changed. Now, how can we restore older versions of things? Let us suppose we change our mind about the last update to mars.txt (questioning the topic of mummies).

git status now tells us that the file has been changed, but those changes have not been staged:

BASH

$ git status

OUTPUT

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")

We can put things back the way they were by using git restore:

BASH

$ git restore mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity

As you might guess from its name, git checkout checks out (i.e., restores) an old version of a file. In this case, we are telling Git that we want to recover the version of the file recorded in HEAD, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:

BASH

$ git checkout f22b25e mars.txt

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color

BASH

$ git status

OUTPUT

On branch main
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    modified:   mars.txt

Notice that the changes are currently in the staging area. Again, we can put things back the way they were by using git checkout:

BASH

$ git checkout HEAD mars.txt

Do Not Lose Your HEAD

Above we used

BASH

$ git checkout f22b25e mars.txt

to revert mars.txt to its state after the commit f22b25e. But be careful! The command checkout has other important functionalities and Git will misunderstand your intentions if you are not accurate with the typing. For example, if you forget mars.txt in the previous command.

BASH

$ git checkout f22b25e

ERROR

Note: checking out 'f22b25e'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

 git checkout -b <new-branch-name>

HEAD is now at f22b25e Start notes on Mars suitability for vampires and werewolves

The “detached HEAD” is like “look, but do not touch” here, so you should not make any changes in this state. After investigating your repository’s past state, reattach your HEAD with git checkout main.

It is important to remember that we must use the commit number that identifies the state of the repository before the change we are trying to undo. A common mistake is to use the number of the commit in which we made the change we are trying to discard. In the example below, we want to retrieve the state from before the most recent commit (HEAD~1), which is commit f22b25e:

Git Checkout

So, to put it all together, here is how Git works in cartoon form:

A cartoon of how Git works within one local repository
https://doi.org/10.6084/m9.figshare.1328266.v1 (Non-Federal link)/ Created by Daisie Huang and provided on figshare, CC BY 4.0

Simplifying the Common Case

If you read the output of git status carefully, you will see that it includes this hint:

OUTPUT

(use "git checkout -- <file>..." to discard changes in working directory)

As it says, git checkout without a version identifier restores files to the state saved in HEAD. The double dash -- is needed to separate the names of the files being recovered from the command itself: without it, Git would try to use the name of the file as the commit identifier.

The fact that files can be reverted one by one tends to change the way people organize their work. If everything is in one large document, it is hard (but not impossible) to undo changes to the introduction without also undoing changes made later to the conclusion. If the introduction and conclusion are stored in separate files, on the other hand, moving backward and forward in time becomes much easier.

Recovering Older Versions of a File

Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning “broke” the script and it no longer runs. She has spent ~ 1hr trying to fix it, with no luck…

Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called data_cruncher.py?

  1. $ git checkout HEAD

  2. $ git checkout HEAD data_cruncher.py

  3. $ git checkout HEAD~1 data_cruncher.py

  4. $ git checkout <unique ID of last commit> data_cruncher.py

  5. Both 2 and 4

The answer is (5)-Both 2 and 4.

The checkout command restores files from the repository, overwriting the files in your working directory. Answers 2 and 4 both restore the latest version in the repository of the file data_cruncher.py. Answer 2 uses HEAD to indicate the latest, whereas answer 4 uses the unique ID of the last commit, which is what HEAD means.

Answer 3 gets the version of data_cruncher.py from the commit before HEAD, which is NOT what we wanted.

Answer 1 can be dangerous! Without a filename, git checkout will restore all files in the current directory (and all directories below it) to their state at the commit specified. This command will restore data_cruncher.py to the latest commit version, but it will also restore any other files that are changed to that version, erasing any changes you may have made to those files! As discussed above, you are left in a detached HEAD state, and you do not want to be there.

Reverting a Commit

Jennifer is collaborating with colleagues on her Python script. She realizes her last commit to the project’s repository contained an error, and wants to undo it. Jennifer wants to undo correctly so everyone in the project’s repository gets the correct change. The command git revert [erroneous commit ID] will create a new commit that reverses the erroneous commit.

The command git revert is different from git checkout [commit ID] because git checkout returns the files not yet committed within the local repository to a previous state, whereas git revert reverses changes committed to the local and project repositories.

Below are the right steps and explanations for Jennifer to use git revert, what is the missing command?

  1. ________ # Look at the git history of the project to find the commit ID

  2. Copy the ID (the first few characters of the ID, e.g. 0b1d055).

  3. git revert [commit ID]

  4. Type in the new commit message.

  5. Save and close

The command git log lists project history with commit IDs.

The command git show HEAD shows changes made at the latest commit, and lists the commit ID; however, Jennifer should double-check it is the correct commit, and no one else has committed changes to the repository.

Understanding Workflow and History

What is the output of the last command in

BASH

$ cd vampires-and-werewolves
$ echo "Mummies are beautiful and full of love" > mummies.txt
$ git add mummies.txt
$ echo "Mummies are smelly and gross" >> mummies.txt
$ git commit -m "Comment on Mummy hygiene"
$ git checkout HEAD mummies.txt
$ cat mummies.txt #this will print the contents of mummies.txt to the screen
  1. OUTPUT

      Mummies are smelly and gross
  2. OUTPUT

      Mummies are beautiful and full of love
  3. OUTPUT

      Mummies are beautiful and full of love
      Mummies are smelly and gross
  4. OUTPUT

      Error because you have changed mummies.txt without committing the changes

The answer is 2.

The command git add mummies.txt places the current version of mummies.txt into the staging area. The changes to the file from the second echo command are only applied to the working copy, not the version in the staging area.

So, when git commit -m "Comment on Mummy hygiene" is executed, the version of mummies.txt committed to the repository is the one from the staging area and has only one line.

At this time, the working copy still has the second line (and git status will show that the file is modified). However, git checkout HEAD mummies.txt replaces the working copy with the most recently committed version of mummies.txt.

So, cat mummies.txt will output

OUTPUT

Mummies are beautiful and full of love.

Checking Understanding of git diff

Consider this command: git diff HEAD~9 mars.txt. What do you predict this command will do if you execute it? What happens when you do execute it? Why?

Try another command, git diff [ID] mars.txt, where [ID] is replaced with the unique identifier for your most recent commit. What do you think will happen, and what does happen?

Getting Rid of Staged Changes

git checkout can be used to restore a previous commit when unstaged changes have been made, but will it also work for changes that have been staged but not committed? Make a change to mars.txt, add that change using git add, then use git checkout to see if you can remove your change.

After adding a change, git checkout can not be used directly. Let us look at the output of git status:

OUTPUT

On branch main
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   mars.txt

Note that if you do not have the same output you may either have forgotten to change the file, or you have added it and committed it.

Using the command git checkout -- mars.txt now does not give an error, but it does not restore the file either. Git helpfully tells us that we need to use git reset first to unstage the file:

BASH

$ git reset HEAD mars.txt

OUTPUT

Unstaged changes after reset:
M	mars.txt

Now, git status gives us:

BASH

$ git status

OUTPUT

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   mars.txt

no changes added to commit (use "git add" and/or "git commit -a")

This means we can now use git checkout to restore the file to the previous commit:

BASH

$ git checkout -- mars.txt
$ git status

OUTPUT

On branch main
nothing to commit, working tree clean

Explore and Summarize Histories

Exploring history is an important part of Git, and often it is a challenge to find the right commit ID, especially if the commit is from several months ago.

Imagine the vampires-and-werewolves project has more than 50 files. You would like to find a commit that modifies some specific text in mars.txt. When you type git log, a very long list appeared. How can you narrow down the search?

Recall that the git diff command allows us to explore one specific file, e.g., git diff mars.txt. We can apply a similar idea here.

BASH

$ git log mars.txt

Unfortunately some of these commit messages are very ambiguous, e.g., update files. How can you search through these files?

Both git diff and git log are very useful and they summarize a different part of the history for you. Is it possible to combine both? Let us try the following:

BASH

$ git log --patch mars.txt

You should get a long list of output, and you should be able to see both commit messages and the difference between each commit.

Question: What does the following command do?

BASH

$ git log --patch HEAD~9 *.txt

Key Points

  • git diff displays differences between commits.
  • git checkout recovers old versions of files.

Content from Ignoring Things


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How can I tell Git to ignore files I don’t want to track?

Objectives

  • Configure Git to ignore specific files.
  • Explain why ignoring files can be useful.

What if we have files that we do not want Git to track for us, like backup files created by our editor or intermediate files created during data analysis? Let’s create a few dummy files:

BASH

$ mkdir results
$ touch a.csv b.csv c.csv results/a.out results/b.out

and see what Git says:

BASH

$ git status

OUTPUT

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	a.csv
	b.csv
	c.csv
	results/

nothing added to commit but untracked files present (use "git add" to track)

Putting these files under version control would be a waste of disk space. What’s worse, having them all listed could distract us from changes that actually matter, so let’s tell Git to ignore them.

We do this by creating a file in the root directory of our project called .gitignore:

BASH

$ nano .gitignore
$ cat .gitignore

OUTPUT

*.csv
results/

These patterns tell Git to ignore any file whose name ends in .csv and everything in the results directory. (If any of these files were already being tracked, Git would continue to track them.)

Once we have created this file, the output of git status is much cleaner:

BASH

$ git status

OUTPUT

On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore

nothing added to commit but untracked files present (use "git add" to track)

The only thing Git notices now is the newly-created .gitignore file. You might think we wouldn’t want to track it, but everyone we’re sharing our repository with will probably want to ignore the same things that we’re ignoring. Let’s add and commit .gitignore:

BASH

$ git add .gitignore
$ git commit -m "Ignore data files and the results folder"
$ git status

OUTPUT

On branch main
nothing to commit, working tree clean

As a bonus, using .gitignore helps us avoid accidentally adding files to the repository that we don’t want to track:

BASH

$ git add a.csv

OUTPUT

The following paths are ignored by one of your .gitignore files:
a.csv
Use -f if you really want to add them.

If we really want to override our ignore settings, we can use git add -f to force Git to add something. For example, git add -f a.csv. We can also always see the status of ignored files if we want:

BASH

$ git status --ignored

OUTPUT

On branch main
Ignored files:
 (use "git add -f <file>..." to include in what will be committed)

        a.csv
        b.csv
        c.csv
        results/

nothing to commit, working tree clean

Ignoring Nested Files

Given a directory structure that looks like:

BASH

results/data
results/plots

How would you ignore only results/plots and not results/data?

If you only want to ignore the contents of results/plots, you can change your .gitignore to ignore only the /plots/ subfolder by adding the following line to your .gitignore:

OUTPUT

results/plots/

This line will ensure only the contents of results/plots is ignored, and not the contents of results/data.

As with most programming issues, there are a few alternative ways that one may ensure this ignore rule is followed. The “Ignoring Nested Files: Variation” exercise has a slightly different directory structure that presents an alternative solution. Further, the discussion page has more detail on ignore rules.

Including Specific Files

How would you ignore all .csv files in your root directory except for final.csv? Hint: Find out what ! (the exclamation point operator) does

You would add the following two lines to your .gitignore:

OUTPUT

*.csv           # ignore all data files
!final.csv      # except final.csv

The exclamation point operator will include a previously excluded entry.

Note also that because you’ve previously committed .csv files in this lesson they will not be ignored with this new rule. Only future additions of .csv files added to the root directory will be ignored.

Ignoring Nested Files: Variation

Given a directory structure that looks similar to the earlier Nested Files exercise, but with a slightly different directory structure:

BASH

results/data
results/images
results/plots
results/analysis

How would you ignore all of the contents in the results folder, but not results/data?

Hint: think a bit about how you created an exception with the ! operator before.

If you want to ignore the contents of results/ but not those of results/data/, you can change your .gitignore to ignore the contents of results folder, but create an exception for the contents of the results/data subfolder. Your .gitignore would look like this:

OUTPUT

results/*               # ignore everything in results folder
!results/data/          # do not ignore results/data/ contents

Ignoring all data Files in a Directory

Assuming you have an empty .gitignore file, and given a directory structure that looks like:

BASH

results/data/position/gps/a.csv
results/data/position/gps/b.csv
results/data/position/gps/c.csv
results/data/position/gps/info.txt
results/plots

What’s the shortest .gitignore rule you could write to ignore all .csv files in result/data/position/gps? Do not ignore the info.txt.

Appending results/data/position/gps/*.csv will match every file in results/data/position/gps that ends with .csv. The file results/data/position/gps/info.txt will not be ignored.

Ignoring all data Files in the repository

Let us assume you have many .csv files in different subdirectories of your repository. For example, you might have:

BASH

results/a.csv
data/experiment_1/b.csv
data/experiment_2/c.csv
data/experiment_2/variation_1/d.csv

How do you ignore all the .csv files, without explicitly listing the names of the corresponding folders?

In the .gitignore file, write:

OUTPUT

**/*.csv

This will ignore all the .csv files, regardless of their position in the directory tree. You can still include some specific exception with the exclamation point operator.

The Order of Rules

Given a .gitignore file with the following contents:

BASH

*.csv
!*.csv

What will be the result?

The ! modifier will negate an entry from a previously defined ignore pattern. Because the !*.csv entry negates all of the previous .csv files in the .gitignore, none of them will be ignored, and all .csv files will be tracked.

Log Files

You wrote a script that creates many intermediate log-files of the form log_01, log_02, log_03, etc. You want to keep them but you do not want to track them through git.

  1. Write one .gitignore entry that excludes files of the form log_01, log_02, etc.

  2. Test your “ignore pattern” by creating some dummy files of the form log_01, etc.

  3. You find that the file log_01 is very important after all, add it to the tracked files without changing the .gitignore again.

  4. Discuss with your neighbor what other types of files could reside in your directory that you do not want to track and thus would exclude via .gitignore.

  1. append either log_* or log* as a new entry in your .gitignore
  2. track log_01 using git add -f log_01

Key Points

  • The .gitignore file tells Git what files to ignore.

Content from Remotes in GitLab


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I safely back up my work to a remote site?
  • How do I share my changes with others on the web?

Objectives

  • Explain what remote repositories are and why they are useful.
  • Push to or pull from a remote repository.

Version control really comes into its own when we begin to collaborate with other people. We already have most of the machinery we need to do this; the only thing missing is to copy changes from one repository to another.

Systems like Git allow us to move work between any two repositories. In practice, though, it is easiest to use one copy as a central hub, and to keep it on the web rather than on someone’s laptop. Most programmers use hosting services like GitHub, Bitbucket or GitLab to hold those main copies.

Let us start by sharing the changes we have made to our current project with the world. To this end we are going to create a remote repository that will be linked to our local repository.

1. Create a remote repository


Log in to USGS GitLab, then click on the icon in the top right corner to create a new project called vampires-and-werewolves:

Creating a Project on GitLab (Step 1)

Select Create blank project

Creating a Project on GitLab (Step 2)

Name your project “vampires-and-werewolves”, select your username as the namespace, uncheck “Initialize repository with a README”, and then click Create project.

Note: Since this repository will be connected to a local repository, it needs to be empty. That is why “Initialize repository with a README” needs to be unchecked. See the “GitLab README files” exercise below for a full explanation of why the project needs to be empty.

Creating a Project on GitLab (Step 3)

As soon as the repository is created, GitLab displays a page with a URL and some information on how to configure your local repository:

Creating a Project on GitLab (Step 4)

This effectively does the following on Gitlab’s servers:

BASH

$ mkdir vampires-and-werewolves
$ cd vampires-and-werewolves
$ git init

If you remember back to the earlier episode where we added and committed our earlier work on mars.txt, we had a diagram of the local repository which looked like this:

The Local Repository with Git Staging Area

Now that we have two repositories, we need a diagram like this:

Freshly-Made GitLab Repository

Note that our local repository still contains our earlier work on mars.txt, but the remote repository on GitLab appears empty as it does not contain any files yet.

2. HTTPS Setup


Before Dracula can connect to a remote repository, he needs to set up a way for his computer to authenticate with GitLab so it knows it is him trying to connect to his remote repository.

We are going to set up an “Access token” that we can use to authenticate to GitLab.

In GitLab, click on your user icon and then Preferences.

Where to Find User Settings on GitLab

Once you are in your User Settings, click on Access tokens. If you already have a personal access token and you have it saved, you do not need to follow these steps. If you do not have a personal access token, click on Add new token.

Screenshot of Access tokens page and Add new token button

Add a token name that will be meaningful to you. If you are setting this up as your primary access token for GitLab, you will probably want to select all of the scopes. These scopes establish what you are able to do in GitLab with this personal access token. You may also want to delete the expiration date. If removed, GitLab will automatically set the expiration date to the maximum of one year from the day created. Then, click Create personal access token.

Add a personal access token screen in GitLab with Token name entered and all scopes selected

You will be presented with your new personal access token. Make sure you save it some place secure since you will not be able to access it again through GitLab.

Screenshot of GitLab's presentation of your new personal access token

See the GitLab Documentation for more information on personal access tokens.

When you start interacting with the remote from your computer, if you have not already saved your personal access token, you will be prompted to enter a username and password. The prompt may appear in your command prompt or you may get a pop up on your machine. Your username will be your email address and the password will be your personal access token. See the Password Manager spoiler below for more information on saving your personal access token and handling token expiration.

3. Connect local to remote repository


Now we connect the two repositories. We do this by making the GitLab repository a remote for the local repository. The home page of the repository on GitLab includes the URL string we need to identify it:

Where to Find Repository URL on GitLab

Click on the clipboard icon under ‘Clone with HTTPS’ to use the HTTPS protocol.

HTTPS allows you to communicate with GitLab using the HTTPS protocol. This approach tends to be a little simpler and allows you to use a Personal Access Token (similar to a password) to authenticate. You can use the same Personal Access Token across multiple machines.

SSH is considered slightly more secure and requires setting up a public and a private key. There is a little more overhead to using SSH over HTTPS, especially if working on more than one machine, which is why we teach the HTTPS method in this Lesson. That being said, it is not too hard to configure your account to use SSH and the instructions are available at https://docs.gitlab.com/ee/user/ssh.html.

With the URL copied from the browser, go into the local vampires-and-werewolves repository, and run this command:

BASH

$ git remote add origin https://code.usgs.gov/vdracula/vampires-and-werewolves.git

Make sure to use the URL for your repository rather than Vlad’s: the only difference should be your username instead of vdracula.

origin is a local name used to refer to the remote repository. It could be called anything, but origin is a convention that is often used by default in git and GitLab, so it is helpful to stick with this unless there is a reason not to.

We can check that the command has worked by running git remote -v:

BASH

$ git remote -v

OUTPUT

origin  git@code.usgs.gov:vdracula/vampires-and-werewolves.git (fetch)
origin  git@code.usgs.gov:vdracula/vampires-and-werewolves.git (push)

We will discuss remotes in more detail in a future episode, while talking about how they might be used for collaboration.

4. Push local changes to a remote


Now that authentication is setup, we can return to the remote. This command will push the changes from our local repository to the repository on GitLab:

BASH

$ git push origin main

Since Dracula set up a personal access token, it will prompt him for it. If you have already saved your personal access token in Git, it may not prompt for a password.

OUTPUT

Enumerating objects: 16, done.
Counting objects: 100% (16/16), done.
Delta compression using up to 8 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (16/16), 1.45 KiB | 372.00 KiB/s, done.
Total 16 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), done.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
 * [new branch]      main -> main

Proxy

If the network you are connected to uses a proxy, there is a chance that your last command failed with “Could not resolve hostname” as the error message. To solve this issue, you need to tell Git about the proxy:

BASH

$ git config --global http.proxy http://user:password@proxy.url
$ git config --global https.proxy https://user:password@proxy.url

When you connect to another network that does not use a proxy, you will need to tell Git to disable the proxy using:

BASH

$ git config --global --unset http.proxy
$ git config --global --unset https.proxy

If your operating system has a password manager configured, git push will try to use it when it needs your username and password. For example, this is the default behavior for Git Bash on Windows. If you want to type your username and password at the terminal instead of using a password manager, type:

BASH

$ unset SSH_ASKPASS

in the terminal, before you run git push. Despite the name, Git uses SSH_ASKPASS for all credential entry, so you may want to unset SSH_ASKPASS whether you are using Git via SSH or https.

You may also want to add unset SSH_ASKPASS at the end of your ~/.bashrc to make Git default to using the terminal for usernames and passwords.

If your personal access token was saved in your password manager and it expires, you will need to generate a new personal access token and open the password manager to delete the saved credential. Then, Git will prompt you for the new password on your next git push.

Our local and remote repositories are now in this state:

GitLab Repository After First Push

The ‘-u’ Flag

You may see a -u option used with git push in some documentation. This option is synonymous with the --set-upstream-to option for the git branch command, and is used to associate the current branch with a remote branch so that the git pull command can be used without any arguments. To do this, simply use git push -u origin main once the remote has been set up.

We can pull changes from the remote repository to the local one as well:

BASH

$ git pull origin main

OUTPUT

From https://code.usgs.gov/vdracula/vampires-and-werewolves
 * branch            main     -> FETCH_HEAD
Already up-to-date.

Pulling has no effect in this case because the two repositories are already synchronized. If someone else had pushed some changes to the repository on GitLab, though, this command would download them to our local repository.

GitLab GUI

Browse to your vampires-and-werewolves repository on Gitlab. On the right side menu under Project Information, find and click on the text that says “XX commits” (where “XX” is some number). Hover over, and click on, the two buttons to the right of each commit. Additionally, click into each commit. What information can you gather/explore from these buttons and views? How would you get that same information in the shell?

The left-most button (with the picture of a clipboard) copies the full identifier of the commit to the clipboard. In the shell, git log will show you the full commit identifier for each commit.

The right-most button lets you view all of the files in the repository at the time of that commit. To do this in the shell, we would need to checkout the repository at that particular time. We can do this with git checkout ID where ID is the identifier of the commit we want to look at. If we do this, we need to remember to put the repository back to the right state afterwards!

When you click on the commit name, you will see all of the changes that were made in that particular commit. Green shaded lines indicate additions and red ones removals. In the shell we can do the same thing with git diff. In particular, git diff ID1..ID2 where ID1 and ID2 are commit identifiers (e.g. git diff a3bf1e5..041e637) will show the differences between those two commits.

Uploading files directly in GitLab browser

GitLab also allows you to skip the command line and upload files directly to your repository without having to leave the browser. When you are in Code –> Repository, you can click the + button in the toolbar at the top of the file tree, then click Upload File under This directory.

GitLab Timestamp

Go to the repo you just created on GitLab and check the timestamps of the files. How does GitLab record times, and why?

GitLab displays timestamps in a human readable relative format (i.e. “22 hours ago” or “three weeks ago”). However, if you hover over the timestamp, you can see the exact time at which the last change to the file occurred.

Push vs. Commit

In this episode, we introduced the “git push” command. How is “git push” different from “git commit”?

When we push changes, we are interacting with a remote repository to update it with the changes we have made locally (often this corresponds to sharing the changes we have made with others). Commit only updates your local repository.

GitLab README files

In this episode we learned about creating a remote repository on GitLab, but when you initialized your GitLab repo, you did not add a README.md file. If you had, what do you think would have happened when you tried to link your local and remote repositories?

In this case, we would see a merge conflict due to unrelated histories. When GitLab creates a README.md file, it performs a commit in the remote repository. When you try to pull the remote repository to your local repository, Git detects that they have histories that do not share a common origin and refuses to merge.

BASH

$ git pull origin main

OUTPUT

warning: no common commits
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
 * branch            main     -> FETCH_HEAD
 * [new branch]      main     -> origin/main
fatal: refusing to merge unrelated histories

You can force git to merge the two repositories with the option --allow-unrelated-histories. Be careful when you use this option and carefully examine the contents of local and remote repositories before merging.

BASH

$ git pull --allow-unrelated-histories origin main

OUTPUT

From https://code.usgs.gov/vdracula/vampires-and-werewolves
 * branch            main     -> FETCH_HEAD
Merge made by the 'recursive' strategy.
README.md | 1 +
1 file changed, 1 insertion(+)
create mode 100644 README.md

Key Points

  • A local Git repository can be connected to one or more remote repositories.
  • Use the HTTPS protocol to connect to remote repositories.
  • git push copies changes from a local repository to a remote repository.
  • git pull copies changes from a remote repository to a local repository.

Content from Branching and Merging


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What are branches in Git and why should I use them?
  • How do I merge a branch back into my main branch?

Objectives

  • Explain why you would want to use a branching workflow, even when you are the only person working on your project.
  • Create a branch within a Git repository.
  • Create a merge request and merge a branch into a main branch.

Git Branches


A Git branch is a version of the repository where you can make and review changes before updating the clean, trusted content of the repository. A branch is a safe place to test things out without impacting your main branch. You are free to make mistakes and have the flexibility to fix them within a branch.

Create a branch

  1. Open Git Bash (Windows) or Terminal (MacOS) and navigate to your local repository. Once you are in your project, the current branch will be specified (usually main).
  2. Let us create a new branch:
    • Execute git switch -c my-test-branch
    • The -c flag is what tells Git to create a new branch
    • This will create a new branch with the name you specified that is otherwise an identical copy of the branch you just created it from (in this case, main) and switch you over to the new branch
  3. To switch between branches, execute: git switch <branch-name>
    • We can switch back to the main branch with git switch main
    • Switching branches will automatically load all of the files on that branch into your computer’s project file directory

Callout

git switch and git restore were introduced in 2019 to separate out the functionality of git checkout, which confused many people by doing too many things.

git switch <branch-name> can be used interchangeably with git checkout <branch-name>, but the command-line options can be slightly different. If you are switching to an existing branch, then the two would look the same:

BASH

git switch desired-branch-name
git checkout desired-branch-name

However, if you want to create a new branch, they differ:

BASH

git switch -c desired-new-branch-name
git checkout -b desired-new-branch-name

Create a branch on your own

  1. Create a new branch on your own called 1-my-first-issue
  2. Switch back to main
  3. Switch back to 1-my-first-issue
  1. git switch -c 1-my-first-issue
  2. git switch main
  3. git switch 1-my-first-issue

GitLab issues are common ways of tracking the work that needs to be done on a project. A common branch naming convention is to use the issue number and a short description of what you are doing as the branch name (e.g, <issue number>-<what-you-are-doing>), similar to what you did in this exercise. Another common naming convention is to use lower-spear-case for your branch names.

Make updates to code

This is when you do your work. Create your scripts, organize files/folders, etc. Do all your work in the repository with the correct branch checked out.

Important Note

Repositories should not contain any sensitive information, including personally identifiable information, usernames, passwords, or full file paths. While file paths may not be as obviously sensitive as other examples, they are frequently included in scripts. It is worth mentioning that full file paths also decrease portability of scripts to other users!

BASH

git switch 1-my-first-issue

Let us edit the mars.txt file, again.

BASH

nano mars.txt

Type the text below into the mars.txt file after the last line:

OUTPUT

Two vampires and three werewolves were spotted on Mars.

Add the file to the staging area:

BASH

git add mars.txt

Commit the changes:

BASH

git commit -m "Add information about vampire and werewolf co-occurrences"

Push the changes to remote:

BASH

git push -u origin 1-my-first-issue

The -u flag is shorthand for --set-upstream-to, which sets the default remote branch for the current local branch. Prior to this push, the remote repository was not aware of the local branch, and the local branch did not have any connection to the remote. Moving forward, this sets the remote-local association for any future git push or git pull attempts.

Git Merge Requests


Merge requests allow for peer code review before merging new code into a branch (usually the main branch).

Creating Merge Requests

There are many ways to create a merge request in GitLab. See GitLab’s Creating merge requests to see them all.

When you push a new branch into GitLab, GitLab will add a banner message about the push and provide a convenient Create merge request button.

Screenshot of the `Create merge request` button

If you use this method to create the merge request, you will not need to specify the source and target branches.

Add a succinct title and description. The description can follow this basic format:

  • Describe why this merge request exists
  • Explain what was changed
  • Explain how the change addresses the issue
  • Provide information on how the reviewer can test your code

Select an Assignee (This is the person who owns the merge request but is not responsible for reviewing it) and a Reviewer.

Click Create merge request.

Merging Merge Requests

Once a merge request has been created, you can see an overview, the commits that were made, and all of the line-by-line changes that were made to the content.

Screenshot of the Merge Request

After all of the changes have been reviewed, the Reviewer can click Approve and the Assignee can click the Merge button to merge the updates into the main branch.

Challenge

  1. Review your merge request. Can you see the changes that were made? How might you add a comment to a specific line of code?
  2. Merge your changes into main. Are you able to see the updated file in your main branch?

Key Points

  • A branching workflow enables you to keep your main repository clean and allows for mistakes, fixes, and reviews before content is merged into main.

Content from Collaborating


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How can I use version control to collaborate with other people?

Objectives

  • Clone a remote repository.
  • Collaborate by pushing to a common repository.
  • Describe the basic collaborative workflow.

For the next step, get into pairs. One person will be the “Owner” and the other will be the “Collaborator”. The goal is for the Collaborator to add changes into the Owner’s repository. We will switch roles at the end, so both persons will play Owner and Collaborator.

Practicing By Yourself

If you are working through this lesson on your own, you can carry on by opening a second terminal window. This window will represent your partner, working on another computer. You will not need to give anyone access on GitLab, because both ‘partners’ are you.

Update Repository Permissions


The Owner needs to give the Collaborator access. In your repository page on GitLab, click the Manage menu on the left, select Members, click Invite members. Enter your partner’s username or email address in the search box, select a role (either Developer or Maintainer), and click Invite.

screenshot of repository page with Manage then Members selected, showing how to add Collaborators in a GitLab repository

Clone the Repository


Once the Collaborator has access to the repository, they need to download a copy of the Owner’s repository to their machine. This is called “cloning a repo”.

The Collaborator does not want to overwrite their own version of vampires-and-werewolves.git, and so needs to clone the Owner’s repository to a different location than their own repository with the same name. (This is a weird case…you would not normally have two versions of the same Git repo on your local machine.)

To clone the Owner’s repo into their Desktop folder, the Collaborator can copy the repository URL from the repository homepage by clicking Code and Clone with HTTPS.

screenshot of the repository page with the Code menu opened and showing the copy button under Clone with HTTPS

HTTPS allows you to communicate with GitLab using the HTTPS protocol. This approach tends to be a little simpler and allows you to use a Personal Access Token (similar to a password) to authenticate. You can use the same Personal Access Token across multiple machines.

SSH is considered slightly more secure and requires setting up a public and a private key. There is a little more overhead to using SSH over HTTPS, especially if working on more than one machine, which is why we teach the HTTPS method in this Lesson. SSH also requires being on the internal USGS network (including GlobalProtect) and will not work for external collaborators. That being said, it is not too hard to configure your account to use SSH and the instructions are available at https://docs.gitlab.com/ee/user/ssh.html.

Then, open bash and enter the following (replacing https://code.usgs.gov/vdracula/vampires-and-werewolves.git with the URL that was just copied):

BASH

$ git clone https://code.usgs.gov/vdracula/vampires-and-werewolves.git ~/Desktop/vdracula-vampires-and-werewolves

Replace ‘vdracula’ with the Owner’s username.

If you choose to clone without the clone path (~/Desktop/vdracula-vampires-and-werewolves) specified at the end, you will clone inside your own vampires-and-werewolves folder! Make sure to navigate to the Desktop folder first.

Create a New Branch and Make Changes


The Collaborator can now make a change in their clone of the Owner’s repository, exactly the same way as we have been doing before:

BASH

$ cd ~/Desktop/vdracula-vampires-and-werewolves
$ git switch -c pluto-branch
$ nano pluto.txt
$ cat pluto.txt

OUTPUT

It is so a planet!

The Importance of Branches

Using branches in Git becomes even more important when you begin collaborating with others. Branches can help you avoid conflicts and allow others to review your code before merging it with the main branch where it could potentially introduce bugs and conflicts with the work of others on your team. You can also ‘protect’ the default (e.g., main) branch to prevent developers from pushing changes directly to it. If the default branch is protected, the developers must push to a separate branch and then create a merge request to add their changes to the default branch. This workflow ensures that changes to the default branch get reviewed and approved. Learn more about GitLab protected branches in the GitLab Documentation.

Stage, Commit, and Push Changes


BASH

$ git add pluto.txt
$ git commit -m "Add notes about Pluto"

OUTPUT

 1 file changed, 1 insertion(+)
 create mode 100644 pluto.txt

Then push the change to the Owner’s repository on GitLab:

BASH

$ git push -u origin pluto-branch

OUTPUT

Enumerating objects: 4, done.
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 306 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
   9272da5..29aba7c  main -> main

Note that we did not have to create a remote called origin: Git uses this name by default when we clone a repository. (This is why origin was a sensible choice earlier when we were setting up remotes by hand.)

Take a look at the Owner’s repository on GitLab again, and you should be able to see the new branch and commit made by the Collaborator. You may need to refresh your browser to see the new commit.

Create and Comment on a Merge Request

Collaborator: Create a merge request that will merge pluto-branch with main. Set the Owner as the Reviewer.

Owner: Add a comment to the line that was added in pluto.txt. Then, approve and merge the merge request.

Collaborator: Review Branching and Merging Episode “Creating Merge Requests” for a reminder of how to create a merge request in GitLab.

Owner: With GitLab, it is possible to comment on the diff of a merge request. Go to the Changes tab within the merge request. Hover over the line of code to comment and a blue comment icon appears. Click to open a comment window.

Pull Merged Changes to Local Repositories


Once the new code has been merged to the main branch, both the Collaborator and Owner should pull the changes to their local repositories.

To download the changes from GitLab, enter:

BASH

$ git switch main
$ git pull origin main

Now the three repositories (Owner’s local, Collaborator’s local, and Owner’s on GitLab) are back in sync.

A Basic Collaborative Workflow

In practice, it is good to be sure that you have an updated version of the repository you are collaborating on, so you should git pull before making our changes. The basic collaborative workflow would be:

  • update your local repo with git pull origin main,
  • create a feature branch git switch -c <branch-name>,
  • make your changes and stage them with git add,
  • commit your changes with git commit -m,
  • upload the changes to GitLab with git push -u origin <branch-name>,
  • create a merge request in GitLab, and
  • merge once the feature branch has been reviewed and approved.

It is better to make many commits with smaller changes rather than one commit with massive changes: small commits are easier to read and review.

Switch Roles and Repeat

Switch roles and repeat the whole process.

Review Changes

The Owner pushed commits to the repository’s main branch without giving any information to the Collaborator. How can the Collaborator find out what has changed with command line? And on GitLab?

On the command line, the Collaborator can use git fetch origin main to get the remote changes into the local repository, but without merging them. Then by running git diff main origin/main the Collaborator will see the changes output in the terminal.

On GitLab, the Collaborator can go to the repository and click on “Code” -> “Commits” to view the most recent commits pushed to the repository.

Key Points

  • git clone copies a remote repository to create a local repository with a remote called origin automatically set up.
  • Branches are an important part of collaborating with others in Git repositories.
  • Ensure that you establish a collaborative workflow for your project team to use.

Content from Conflicts


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What do I do when my changes conflict with someone else’s?

Objectives

  • Explain what conflicts are and when they can occur.
  • Resolve conflicts resulting from a merge.

As soon as people can work in parallel, they may end up introducing changes that conflict with one another. This will even happen with a single person: if we are working on a piece of software on both our laptop and a server in the lab, we could make different changes to each copy. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.

To see how we can resolve conflicts, we must first create one. The file mars.txt currently looks like this in both partners’ copies of our vampires-and-werewolves repository:

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.

Let us add a line to the collaborator’s copy only:

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
This line added to Wolfman's copy

and then push the change to GitLab:

BASH

$ git add mars.txt
$ git commit -m "Add a line in our home copy"

OUTPUT

[main 5ae9631] Add a line in our home copy
 1 file changed, 1 insertion(+)

BASH

$ git push origin main

OUTPUT

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 331 bytes | 331.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
   29aba7c..dabb4c8  main -> main

Now let us have the owner make a different change to their copy without updating from GitLab:

BASH

$ nano mars.txt
$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We added a different line in the other copy

We can commit the change locally:

BASH

$ git add mars.txt
$ git commit -m "Add a line in my copy"

OUTPUT

[main 07ebc69] Add a line in my copy
 1 file changed, 1 insertion(+)

but Git will not let us push it to GitLab:

BASH

$ git push origin main

OUTPUT

To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://code.usgs.gov/vdracula/vampires-and-werewolves.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
The Conflicting Changes

Git rejects the push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is pull the changes from GitLab, merge them into the copy we are currently working in, and then push that. Let us start by pulling:

BASH

$ git pull origin main

OUTPUT

remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
 * branch            main     -> FETCH_HEAD
    29aba7c..dabb4c8  main     -> origin/main
Auto-merging mars.txt
CONFLICT (content): Merge conflict in mars.txt
Automatic merge failed; fix conflicts and then commit the result.

The git pull command updates the local repository to include those changes already included in the remote repository. After the changes from remote branch have been fetched, Git detects that changes made to the local copy overlap with those made to the remote repository, and therefore refuses to merge the two versions to stop us from trampling on our previous work. The conflict is marked in in the affected file:

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
<<<<<<< HEAD
We added a different line in the other copy
=======
This line added to Wolfman's copy
>>>>>>> dabb4c8c450e8475aee9b14b4383acc99f42af1d

Our change is preceded by <<<<<<< HEAD. Git has then inserted ======= as a separator between the conflicting changes and marked the end of the content downloaded from GitLab with >>>>>>>. (The string of letters and digits after that marker identifies the commit we have just downloaded.)

It is now up to us to edit this file to remove these markers and reconcile the changes. We can do anything we want: keep the change made in the local repository, keep the change made in the remote repository, write something new to replace both, or get rid of the change entirely. Let us replace both so that the file looks like this:

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We removed the conflict on this line

To finish merging, we add mars.txt to the changes being made by the merge and then commit:

BASH

$ git add mars.txt
$ git status

OUTPUT

On branch main
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:

	modified:   mars.txt

BASH

$ git commit -m "Merge changes from GitLab"

OUTPUT

[main 2abf2b1] Merge changes from GitLab

Now we can push our changes to GitLab:

BASH

$ git push origin main

OUTPUT

Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 645 bytes | 645.00 KiB/s, done.
Total 6 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 2 local objects.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
   dabb4c8..2abf2b1  main -> main

Git keeps track of what we have merged with what, so we do not have to fix things by hand again when the collaborator who made the first change pulls again:

BASH

$ git pull origin main

OUTPUT

remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 4), reused 6 (delta 4), pack-reused 0
Unpacking objects: 100% (6/6), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
 * branch            main     -> FETCH_HEAD
    dabb4c8..2abf2b1  main     -> origin/main
Updating dabb4c8..2abf2b1
Fast-forward
 mars.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

We get the merged file:

BASH

$ cat mars.txt

OUTPUT

Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We removed the conflict on this line

We do not need to merge again because Git knows someone has already done that.

Git’s ability to resolve conflicts is very useful, but conflict resolution costs time and effort, and can introduce errors if conflicts are not resolved correctly. If you find yourself resolving a lot of conflicts in a project, consider these technical approaches to reducing them:

  • Pull from origin more frequently, especially before starting new work
  • Use topic branches to segregate work, merging to main when complete
  • Make smaller more atomic commits
  • Push your work when it is done and encourage your team to do the same to reduce work in progress and, by extension, the chance of having conflicts
  • Where logically appropriate, break large files into smaller ones so that it is less likely that two authors will alter the same file simultaneously

Conflicts can also be minimized with project management strategies:

  • Clarify who is responsible for what areas with your collaborators
  • Discuss what order tasks should be carried out in with your collaborators so that tasks expected to change the same lines will not be worked on simultaneously
  • If the conflicts are stylistic churn (e.g. tabs vs. spaces), establish a project convention that is governing and use code style tools (e.g. black (Python), lintr (R), etc.) to enforce, if necessary

Solving Conflicts that You Create

Clone the repository created by your instructor. Add a new file to it, and modify an existing file (your instructor will tell you which one). When asked by your instructor, pull their changes from the repository to create a conflict, then resolve it.

Conflicts on Non-textual files

What does Git do when there is a conflict in an image or some other non-textual file that is stored in version control?

Let us try it. Suppose Dracula takes a picture of Martian surface and calls it mars.jpg.

If you do not have an image file of Mars available, you can create a dummy binary file like this:

BASH

$ head -c 1024 /dev/urandom > mars.jpg
$ ls -lh mars.jpg

OUTPUT

-rw-r--r-- 1 vlad 57095 1.0K Mar  8 20:24 mars.jpg

ls shows us that this created a 1-kilobyte file. It is full of random bytes read from the special file, /dev/urandom.

Now, suppose Dracula adds mars.jpg to his repository:

BASH

$ git add mars.jpg
$ git commit -m "Add picture of Martian surface"

OUTPUT

[main 8e4115c] Add picture of Martian surface
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 mars.jpg

Suppose that Wolfman has added a similar picture in the meantime. His is a picture of the Martian sky, but it is also called mars.jpg. When Dracula tries to push, he gets a familiar message:

BASH

$ git push origin main

OUTPUT

To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://code.usgs.gov/vdracula/vampires-and-werewolves.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

We have learned that we must pull first and resolve any conflicts:

BASH

$ git pull origin main

When there is a conflict on an image or other binary file, git prints a message like this:

OUTPUT

$ git pull origin main
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves.git
 * branch            main     -> FETCH_HEAD
   6a67967..439dc8c  main     -> origin/main
warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)
Auto-merging mars.jpg
CONFLICT (add/add): Merge conflict in mars.jpg
Automatic merge failed; fix conflicts and then commit the result.

The conflict message here is mostly the same as it was for mars.txt, but there is one key additional line:

OUTPUT

warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)

Git cannot automatically insert conflict markers into an image as it does for text files. So, instead of editing the image file, we must check out the version we want to keep. Then we can add and commit this version.

On the key line above, Git has conveniently given us commit identifiers for the two versions of mars.jpg. Our version is HEAD, and Wolfman’s version is 439dc8c0.... If we want to use our version, we can use git checkout:

BASH

$ git checkout HEAD mars.jpg
$ git add mars.jpg
$ git commit -m "Use image of surface instead of sky"

OUTPUT

[main 21032c3] Use image of surface instead of sky

If instead we want to use Wolfman’s version, we can use git checkout with Wolfman’s commit identifier, 439dc8c0:

BASH

$ git checkout 439dc8c0 mars.jpg
$ git add mars.jpg
$ git commit -m "Use image of sky instead of surface"

OUTPUT

[main da21b34] Use image of sky instead of surface

We can also keep both images. The catch is that we cannot keep them under the same name. But, we can check out each version in succession and rename it, then add the renamed versions. First, check out each image and rename it:

BASH

$ git checkout HEAD mars.jpg
$ git mv mars.jpg mars-surface.jpg
$ git checkout 439dc8c0 mars.jpg
$ mv mars.jpg mars-sky.jpg

Then, remove the old mars.jpg and add the two new files:

BASH

$ git rm mars.jpg
$ git add mars-surface.jpg
$ git add mars-sky.jpg
$ git commit -m "Use two images: surface and sky"

OUTPUT

[main 94ae08c] Use two images: surface and sky
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 mars-sky.jpg
 rename mars.jpg => mars-surface.jpg (100%)

Now both images of Mars are checked into the repository, and mars.jpg no longer exists.

A Typical Work Session

You sit down at your computer to work on a shared project that is tracked in a remote Git repository. During your work session, you take the following actions, but not in this order:

  • Make changes by appending the number 100 to a text file numbers.txt
  • Update remote repository to match the local repository
  • Celebrate your success with some fancy beverage(s)
  • Update local repository to match the remote repository
  • Stage changes to be committed
  • Commit changes to the local repository

In what order should you perform these actions to minimize the chances of conflicts? Put the commands above in order in the action column of the table below. When you have the order right, see if you can write the corresponding commands in the command column. A few steps are populated to get you started.

order action . . . . . . . . . . command . . . . . . . . . .
1
2 echo 100 >> numbers.txt
3
4
5
6 Celebrate! AFK
order action . . . . . . command . . . . . . . . . . . . . . . . . . .
1 Update local git pull origin main
2 Make changes echo 100 >> numbers.txt
3 Stage changes git add numbers.txt
4 Commit changes git commit -m "Add 100 to numbers.txt"
5 Update remote git push origin main
6 Celebrate! AFK

Key Points

  • Conflicts occur when two or more people change the same lines of the same file.
  • The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.

Content from Open Science


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What is open science?
  • How is open science valuable?
  • How can version control help me make my work more open?

Objectives

  • Define open science and be able to list attributes or processes that make a research project open.
  • Explain why open science is valuable.
  • Explain how a version control system can be leveraged as an electronic lab notebook for computational work.

In 2023, the U.S. government declared a Year of Open Science and defined open science for federal agencies:

“Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.”

But what does this mean in practice? NASA is one agency leading the way in developing a culture of open science with their Transform to Open Science (TOPS) program, including the publication of an Open Science 101 curriculum. Here at USGS, we can practice open science by releasing scientific code with Git version control via a USGS software information product.

Check Out How USGS Celebrated The Year Of Open Science!

Check out the USGS Year of Open Science webpage to learn about the Community for Data Integration’s (CDI) ‘Open Data for Open Science’ workshop and other USGS open science stories.

Let us take a step back. How is open science valuable and how does publishing your code make your research more open?

Making Code Citable

All USGS software information products are citable with a unique Digital Object Identifier (DOI). You will learn how to create the citation and DOI in the later episode on Citation.

Unless your methods are restricted to a single mathematical operation, it is very difficult to make your research fully reproducible without the code used to analyze and generate results. Sharing the analysis code can significantly increase the reproducibility of published papers (Ince et al. 2012, Laurinavichyute et al. 2022). Additionally, open science practices can lead to more citations, potential collaborators, and funding opportunities (McKiernan et al. 2016). This open model accelerates discovery: the more open work is, the more widely it is cited and re-used (Piwowar et al. 2007).

Researchers are also exploring how the FAIR (Findable, Accessible, Interoperable, and Reusable) data standards can apply to research software. Check out the FAIR Principles for Research Software to learn more.

Are you worried that your code is too messy to share? Fear not: here is an open letter from a professional software engineer telling you that it is good enough. In fact, “if your code is good enough to do the job, then it is good enough to release”.

Is My Work Reproducible?

When analysis is conducted using scientific code, domain and code reviews can help to determine reproducibility (and therefore the accuracy and validity) of the results. You will learn more about these types of reviews in a later episode.

However, people who want to work this way may have some questions about how to approach publishing the code. This is one of the (many) reasons we teach version control. When used diligently, version control with Git acts as a shareable electronic lab notebook for computational work:

  • The conceptual stages of your work are documented, including who did what and when. Every step is stamped with an identifier (the commit ID) that is for most intents and purposes unique.
  • You can tie documentation of rationale, ideas, and other intellectual work directly to the changes that spring from them.
  • You can refer to what you used in your research to obtain your computational results in a way that is unique and recoverable.
  • With a version control system such as Git, the entire history of the repository is easy to archive for perpetuity.

Challenge: Is There An Advantage To Publishing Scientific Code Using Version Control Software?

Publishing your scientific code as a git repository is more open or valuable than publishing it as part of a data release. TRUE or FALSE?

True. The advantages of publishing your scripts in a Git repository include:

  • Publishing the history of changes. This keeps a record of what methods were explored, prior versions and approaches, and what did not work well.
  • Keeping track of who authored what. Tracking helps authors receive credit for the work accomplished.
  • Providing an easy way to correct errors or make updates as new information becomes available.
  • Simplifying how others can access and use your code. Anyone can clone your repository and immediately start using your code.

Key Points

  • Open scientific work is more useful and more highly cited than closed
  • Publishing code is a critical part of making science reproducible
  • If your code is good enough to produce scientific results, then it is good enough to publish

Content from Policy


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What is an official USGS software information product?
  • When am I required to release my software as an official USGS software information product?
  • When may I release my software as an official USGS software information product?

Objectives

  • Identify the difference between a software project and an official USGS software information product.
  • Explain requirements for releasing software as an official USGS software information product.
  • Identify the policy hierarchy relationship among federal, agency, and USGS authorities.

Computer commands written in a computer programming language that are meant to be read by people. As such, source code is a higher-level representation of computer commands and, therefore, must be assembled, interpreted, or compiled before a computer can execute it as a program.

Example

CPP

// file: hello.cpp
#include <stdio.h>

int main() {
  printf("Hello world!\n");
  return 0;
}

The above is an example of a file called “hello.cpp” that contains source code written in the C++ programming language. While the source code is relatively easily understood by a human, a computer is not able to execute this file directly.

BASH

$ ./hello.cpp
./hello.cpp: line 3: syntax error near unexpected token `('
./hello.cpp: line 3: `int main() {'

Instead this file must be compiled to an executable using a command similar to the following:

BASH

$ gcc -o hello hello.cpp

The resulting file, “hello”, contains binary machine code that can be executed by the computer.

BASH

$ ./hello
Hello world!

While C++ is an explicitly compiled language, other languages are more sublte and may leverage just-in-time compilation or interpretation of the source code. In these more subtle languages there may not be an explicit compilation command. Only the source code file exists and it is quietly compiled or interpreted behind the scenes upon execution.

Examples of these more subtle languages include Python or Shell scripts.

Source code developed by- or on behalf of- the U.S. Geological Survey that is- or intends to be- publicly accessible must be stored within a Git repository on the USGS Git Hosting Platform. This Git repository may have multiple branches, tags, and commits. There may exist issue trackers, build artifacts, milestones etc. Taken together, the activities and artifacts related to the prior, ongoing, or upcoming development activities of source code are considered a “software project”.

Official Software Information Product


When a software project reaches some level of maturity (e.g., results are used to support a published manuscript), it must be released as an “official USGS software information product”. While software projects may not be cited by other official USGS information product types (e.g., data releases, journal articles, etc.), official USGS software information products are citable. The desire to cite a software project is one example requiring the author to release the project as an official USGS software information product; however local policies (e.g., science center or equivalent organizational unit) may also define additional criteria requiring the author to release the project as an official USGS software information product.

An official USGS software information product reflects a point-in-time snapshot of a software project’s source code and relevant artifacts. This snapshot must be reviewed and receive appropriate approval to be made public. This snapshot is typically created using a Git tag in the repository and an associated GitLab release.

Open Source Software Project Development


A software project may be developed publicly as an open-source software project given the project complies with all governing policies. Subject to limited exceptions, current policy requires a software project must be made open-source when specific criteria are met, for example:

  1. The project, or results thereof, are deemed sufficient to be used by the current or future research project(s)
  2. A project that was contracted through a service contract vehicle is accepted by the federal contracting authority to satisfy contract requirements
  3. The source code in the project is no longer considered truly exploratory or disposable in nature
  4. The library or application produced by the software project is used by USGS or other federal staff on a regular, recurring basis
  5. The library or application produces actionable information at scales and timeframes relevant to decision makers

When developing an open-source project, all contributions to the project must receive, at minimum, an administrative security review before the contribution is integrated into the project repository. Depending on the project, this review provides an opportunity to complete other types of review as well, e.g., technical code review.

Branching vs Forking Workflows

In this course we describe a branching workflow. This workflow is simpler for individual developers to understand when getting started with Git. However a forking workflow may be better suited for open source project development.

Current policy requires all contributions to open source projects be reviewed before they are integrated with the public project. Since all branches in the public project are themselves public, there is no way to use a branching workflow and comply with current policy. Under a branching workflow, the author must determine a method for sharing and reviewing their code prior to pushing their changes to GitLab; this may be fragile or lead to unversioned changes.

Conversely, a forking workflow enables reviews to occur by way of merge requests from the internal fork repository location to the public upstream repository location prior to integrating said contributions with the public project repository. In this way open source development may continue collaboratively while adhering to current policy requirements.

Governing policies (see below) determine the requirements for both open development practices and release of official USGS software information products, which are determined at each of Federal, Agency (Department of the Interior), Bureau (U.S. Geological Survey), and local (e.g., science center or equivalent organizational unit) levels.

In general, the policies are structured in a hierarchy such that higher level policy (e.g., federal policy) provides generalized guidance and lower level policy provides increasing specificity and clarity. Lower level policy may not supersede or conflict with higher level policy.

Challenge: Identifying relevant policies

Select from which source(s) there exist policies governing the release of official USGS software information products.

A. Federal

B. Departmental

C. Bureau

D. Local

Policies are known to exist for each of (A), (B), and (C) sources. Science Centers and offices may implement additional policies with which you must comply.

Requirements


The following are required to release a software project as an official USGS software information product.

  1. Proper license, disclaimer, and metadata (code.json file)
  2. Appropriate review(s) and approval as defined by current policy
  3. An approved Information Product Data System (IPDS) record
  4. A Git tag within the project corresponding to the official USGS software information product and a GitLab release corresponding to the Git tag
  5. A digital object identifier (DOI)

Other episodes in this lesson detail procedures to satisfy each of the preceding requirements.

Key Points

  • Software may be publicly accessible as an open source software project and/or as an official USGS software information product.

  • While both a project and product may be public, only the official USGS software information product is citable by other publications.

  • Governing policies cascade from Federal to local levels. Check with your supervisor to ensure compliance with all local policies.

Content from Licensing


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What licensing information should I include with my work?

Objectives

  • Explain why adding licensing information to a repository is important.
  • Choose a proper license.

Under U.S. copyright law, copyright protection automatically arises in original creative works that are fixed in any tangible medium of expression (e.g., a written work on paper, an audio/visual recording on tape, a sculptured work out of marble). However an original work of the U.S. Government is not eligible for copyright protection in the United States (17 USC 105a). This restriction means that as USGS employees, any original work that we create in the course of our official duties and responsibilities are automatically in the public domain.

DOI solicitor note 1

Depending on the jurisdiction, the U.S. Government may have foreign copyright protections in U.S. Government work. Further, 17 USC 105a does not prevent the U.S. Government from owning copyright (e.g., if a USGS contractor creates an original creative work under an agreement, copyright arises in the work to the contractor, and the USGS may obtain ownership of the copyright through a contract).

Instead, all software developed by the USGS should include a LICENSE.md to notify the public of the copyright status of the software. Why add a LICENSE.md at all? If we do not include a license clarifying that we have waived the copyright - thus making this fact explicit - the uncertainty (is there a copyright, is there not?) in the mind of a potential user could inhibit potential usage of said work, thus reducing its impact and value. When someone reuses a creative work without a license, the author of that work could sue for copyright infringement. A license solves this problem by explicitly granting rights to others (the licensees) that they would otherwise not have (or not know that they have).

What licenses have I already accepted?

Many of the software tools we use on a daily basis (including in this workshop) are released as open-source software. Pick a project on GitHub from the list below, or one of your own choosing. Find its license (usually in a file called LICENSE or COPYING) and talk about how it restricts your use of the software.

  • Git\(^1\), the source-code management tool
  • CPython\(^1\), the standard implementation of the Python language
  • Jupyter\(^1\), the project behind web-based Python notebooks
  • R software\(^1\), read-only mirror of the R software source code

Both R software and Git use the GNU General Public License, which is one of the most commonly used series of software license for free and open-source software. One way in which it differs from the CC0 Public Domain license (more detail on that below) is that it specifies all derivative work must be distributed under the same or equivalent license terms, which is important for keeping open software open. In other words, an open-source license such as the GNU GPL series of licenses, differs greatly from the CC0 in that the former places certain restrictions on the use, copying, and redistribution of the software, while the latter places no restrictions whatsoever.

What rights are being granted under which conditions differs, often only slightly, from one license to another. The Creative Commons Public Domain Dedication (CC0) is the most commonly used ‘license’ at USGS (currently CC0 1.0\(^1\)). It assumes the software is either completely original or using other software also with the CC0 license. This license places the work as completely as possible in the public domain so that it is free for others to build upon, enhance, or reuse. It should work for most USGS software, assuming that it was developed solely by federal employees and does not include any software developed by others that is not publicly dedicated. The text for this license is included in a callout box below. You can add a LICENSE.md file in your project root repository and copy and paste the text below.

DOI solicitor note 2

If the USGS wishes to release software originally created by a federal contractor, it may either:

  1. require the contractor to release the software under a CC0 public domain dedication, or
  2. require the contractor to assign all intellectual property rights, title, and interest in the software to USGS

and then release the software under a CC0 public domain dedication.

For contractor positions that work closely alongside federal positions, please refer to the Contracting Officer for questions concerning how code sharing is addressed in the contract.

Note that you can also use this Copyright Dedication Agreement to formally place materials in the public domain.

CC0 1.0 license text


MARKDOWN


# License

Unless otherwise noted, this project work is in the public domain in the United
States because it is a work of the United States Geological Survey, an agency
of the United States Department of Interior. For more information, see the
official USGS copyright policy at
https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits

Additionally, the USGS waives all copyright and related rights in the work
worldwide through the CC0 1.0 Universal public domain dedication.


## CC0 1.0 Universal Summary

This is a human-readable summary of the
[Legal Code (read the full text)][1].


### No Copyright

The person or entity who associated a work with this deed has dedicated the
work to the public domain by waiving all of his or her rights to the work
worldwide under copyright law, including all related and neighboring rights,
to the extent allowed by law.

You can copy, modify, distribute and perform the work, even for commercial
purposes, all without asking permission.


### Other Information

In no way are the patent or trademark rights of any person affected by CC0,
nor are the rights that other persons may have in the work or in how the
work is used, such as publicity or privacy rights.

Unless expressly stated otherwise, the person who associated a work with
this deed makes no warranties about the work, and disclaims liability for
all uses of the work, to the fullest extent permitted by applicable law.
When using or citing the work, you should not imply endorsement by the
author or the affirmer.



[1]: https://creativecommons.org/publicdomain/zero/1.0/legalcode

Key Points

  • A LICENSE file is often used in a repository to indicate how the contents of the repo may be used by others.
  • USGS software products require a LICENSE.md file in the project root of your repository.
  • Non-derivative USGS software products can use the CC0 1.0 license.
  • If you need a different license, consult the solicitor’s office to determine the appropriate license.

1: non-Federal link

Content from Citation


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I create a digital object identifier (DOI)?
  • How can I make my work easy to cite?

Objectives

  • Learn how to create a digital object identifier (DOI).
  • Make your work easy to cite.

All USGS Software Information Products are required to have a digital object identifier (DOI) assigned to them. A DOI is persistent identifier tied to a unique object that you specify. USGS uses the Asset Identifier Service to reserve and manage DOIs for software products. You can reserve a DOI by providing the Title of your Software Product and the USGS Science Center or Program responsible for the software project. Remember not to activate/publish the DOI to DataCite until you have received official approval to release the software product. You will learn more about activating/publishing the DOI in a later episode.

Once you have the DOI, you can write a suggested citation by including the reserved DOI that you receive from the Asset Identifier Service in your citation as a full URL (e.g., https://doi.org/10.5066/xxxxxxxx). For example:

Dracula, V. and Wolfman, L.T., 2024, Vampires and Werewolves, version 1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/xxxxxxxx.

You can place the suggested citation in the README.md file in your root directory, making it easy to find.

Try adding a CITATION.md file

Although not required for a USGS Software Information Product, you may want to consider adding a CITATION.md file that describes how to reference or cite your project. You can include a plain text version of the citation that’s easy to copy and paste as well as a BibTex entry.

Here’s an example of what Dracula would write in his CITATION.md file:


To reference the Vampire and Werewolves software product in a publication, you can cite:

Dracula, V. and Wolfman, L.T., 2024, Vampires and Werewolves, version 1.0.0, U.S. Geological Survey software release, https://doi.org/10.5066/xxxxxxxx.

@software{dracula-vampires-werewolves-2024,
  author      = {Dracula, Vlad AND Wolfman, L.T.},
  title       = {Vampire and Werewolves},
  version     = {1.0.0},
  year        = {2024},
  doi         = {10.5066/xxxxxxxx}
}

The second part of that documentation is a Bibtex entry, which can be ingested by some bibliography software. If there is an associated publication, you can add that to CITATION.md too.

To explore this topic in more detail, check out the Software Sustainability Institute blog or the FORCE11 Software Citation Group’s citation principles.

Key Points

  • Create a DOI for your software information product.
  • Add a suggested citation to your repository.

Content from Commonly Included Files


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What are some files which are usually included in USGS software projects, and what should their content be?

Objectives

  • Draw awareness to common “boilerplate” files which are usually included in software products.
  • Provide usable examples of each of these.

Disclaimers


All USGS software information products must contain appropriate disclaimers. This is unique among the files discussed here. While the others are strongly recommended, they are not required by Fundamental Science Practices (FSP).

The location of the disclaimer must be given as part of the code.json metadata which accompanies USGS software information products (see episode Creating Metadata).

The disclaimer used for open-source software projects must be different from the one used for official USGS software information Products.

Provisional disclaimers

The provisional disclaimer must remain in any branch or tag which does not represent an official USGS software information product. The official disclaimer may only be used in tags (or temporarily in release-candidate branches working towards a tag) that represent Official USGS Software Information Products.

For more information on this, see the Reviews for Authors lesson on preparing the release branch and the Publishing lesson on managing tags.

Open-source software projects

For an open-source software project, appropriate content for the DISCLAIMER.md may be found in section 11 of the FSP Guidance on Disclaimer Statements Allowed in USGS Science Information Products:

Official USGS Software Information Product

For an official USGS software information product, appropriate content for the DISCLAIMER.md may be found in section 5 of the FSP Guidance on Disclaimer Statements Allowed in USGS Science Information Products:

Readme


When included, a README.md file will be rendered to text on the GitLab page of its project. This file should give a human-readable description of the project. It can also contain details about how to use the project. It should also give pointers to other relevant information about the project, when that information is contained in other files. For example, there might be a “Contributing to this project” section which points to a CONTRIBUTING.md file, or an R library’s README.md might point users to that library’s vignettes.

For some examples of effective README files in USGS projects, see

Contributing


If you are willing to accept contributions from outside your team, you can include a CONTRIBUTING.md which explains your project’s policies and procedures for doing so. An example is below.

Customize the example for your project

Before using the example below, you would need to change [1] and [4] to appropriate URLs from your package repository, and choose appropriate URLs for [2] and [3] based on whether your project is on GitHub or GitLab.

MARKDOWN

Contributing
============

Contributions are welcome from the community. Questions can be asked on the
[issues page][1]. Before creating a new issue, please take a moment to
search and make sure a similar issue does not already exist. If one does
exist, you can comment (most simply even with just a `:+1:`) to show your
support for that issue.

If you have direct contributions you would like considered for incorporation
into the project you can [fork this
repository](https://docs.gitlab.com/ee/user/project/repository/forking_workflow.html#create-a-fork)
and [submit a merge
request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html#when-you-work-in-a-fork)
for review. Please note that all contributions will be considered public domain
(see [license][2] for details).

[1]: Replace this text with the URL for your project's issues page
[2]: Replace this text with the URL for your project's license

Code of Conduct


This is a file, typically called CODE_OF_CONDUCT.md, that describes expected conduct from users contributing to the project. At a minimum this file must specify that all contributions to the project must abide by the USGS Code of Scientific Conduct. It is also appropriate for it to include further language specifying expectations for contributors’ behavior as part of the project’s community. A suitable example of such a file’s contents follows:

MARKDOWN

# Contributor Code of Conduct

All contributions to- and interactions surrounding- this project will abide
by the [USGS Code of Scientific Conduct][1].

[1]: https://www2.usgs.gov/fsp/fsp_code_of_scientific_conduct.asp

We are committed to making participation in this project a harassment-free
experience for everyone, regardless of level of experience, gender, gender
identity and expression, sexual orientation, disability, personal
appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual
language or imagery, derogatory comments or personal attacks, trolling,
public or private harassment, insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct. Project maintainers who do
not follow the Code of Conduct may be removed from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by opening an issue or contacting one or more of the project
maintainers.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0

Which file must be included in all USGS software repositories?

There are conventions for files included in software repositories that explain the purpose of the repository or how its team works. Many of these are recommended but optional. One, however, is mandatory. Which file is mandatory, and why?

The DISCLAIMER.md is mandatory in published USGS software repositories, because it is required by FSP.

Key Points

  • USGS software products typically contain “boilerplate” files.
  • Some of these files, like the DISCLAIMER.md, are mandatory and must be included in all USGS software products. Others are optional.
  • Examples of these files may be found in existing projects, or on this page.

Content from Creating Metadata


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What is a code.json file?
  • How do you create a code.json file?
  • What are the required fields in the code.json file for a USGS software project?

Objectives

  • Explain what a code.json file is and how it is used.
  • Create a code.json file with the minimum required fields for a USGS software project and software information products.
  • Validate a code.json file.
  • Update a code.json file for a new version of the software.

Introduction


Metadata are descriptive elements in a standardized format that are necessary for identification, discovery, access, and use of information products such as software and data. Metadata answer fundamental questions such as who, what, when, where, why, and how.

Metadata for a software project are stored and maintained in a file called code.json located at the top-level of the project repository in GitLab. This code.json file is in JavaScript Object Notation (JSON) format. The code.json file provides basic information about the project and official software information products and will be aggregated with the information from other Department of the Interior software projects to form the Departmental Enterprise Code Inventory, which is required by the Federal Source Code Policy. The code.json file is required for software information products but may be created for projects without official software information products.

JSON Overview


The JSON data format allows machine-to-machine communication with structured text. JSON is language agnostic.

JSON Syntax:

  • Use key/value pairs

    • keys are strings, indicated by double quotes
    • values can be:
      • strings ("Vlad Dracula"),
      • numbers (1.5),
      • objects ({"key": "value", "key2": "value2"}),
      • arrays ([lists]),
      • boolean (true / false), or
      • null
    • separate keys from values with a colon
      • Format: "key": "value"
      • Example: "name": "Vlad Dracula"
  • Separate key/value pairs with commas:

    JSON

    {
        "name": "Vlad Dracula",
        "organization": "U.S. Geological Survey"
    }

Callout

Note: Many other languages (e.g., Python) allow trailing commas; however, trailing commons are considered an error for JSON syntax. For example, the following would give you an error:

JSON

{
  "name": "Vlad Dracula",
  "organization": "U.S. Geological Survey",
}

Generally, a JSON file will contain an object or an array. If it is an object, it will start and end with curly brackets {}. If it is an array, it will start and end with square brackets [].

Let us create a JSON file in our GitLab project space with the filename hello-world.json.

Screenshot showing a red circle around where to click to create a new file in a GitLab repository
Screenshot of adding a new file to a GitLab repository

In the web browser, add the following content to hello-world.json:

JSON

{
    "greeting": "hello-world"
}

Notice that in our example, our JSON represents an object since it starts with curly brackets. Also, notice that the GitLab web editor provides some highlighting and indentation assistance similar to what a desktop editor might provide.

You can use a JSON Validator like JSON Formatter & Validator to format and check your JSON.

Let us try adding a trailing comma in our JSON and validating it:

JSON

{
    "greeting": "hello-world",
}

The JSON Formatter & Validator will tell you what it found wrong and attempt to fix it for you:

OUTPUT

Info: Removed trailing comma.

Metadata Template


USGS provides a code.json template (see below) to help you get started writing project metadata. Notice that its top-level element is an array, which is designated by the square brackets.

JSON

[
  {
    "name": "REPOSITORY_NAME",
    "organization": "U.S. Geological Survey",
    "description": "REPOSITORY_DESCRIPTION",
    "version": "RELEASE_VERSION",
    "status": "RELEASE_STATUS",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME",
    "downloadURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/archive/RELEASE_VERSION/REPOSITORY_NAME-RELEASE_VERSION.zip",
    "disclaimerURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME.git",
    "vcs": "git",

    "laborHours": 0,

    "tags": [
      "TOPIC_TAG_1",
      "TOPIC_TAG_2"
    ],

    "languages": [
      "PROGRAMMING_LANG_1",
      "PROGRAMMING_LANG_2"
    ],

    "contact": {
      "name": "REPOSITORY_ADMINISTRATOR_NAME",
      "email": "REPOSITORY_ADMINISTRATOR_EMAIL"
    },

    "date": {
      "metadataLastUpdated": "YYYY-MM-DD"
    }
  }
]

Create a Metadata File in GitLab

Create a new code.json file at the top level of your GitLab repository:

Screenshot showing a red circle around where to click to create a new file in a GitLab repository
Screenshot of adding a new file to a GitLab repository

Paste the template JSON into the file, add a commit message, and click Commit changes:

Screenshot of writing a commit message for adding the code.json file to a GitLab repository
Screenshot of adding a code.json template to a GitLab repository

Add Project-Specific Information

Now, edit the code.json file to include project-specific information. While viewing the code.json file in GitLab, click Edit and Edit single file:

Screenshot of clicking Edit on the code.json file in the web browser
Screenshot of editing a file in GitLab

Replace the ALL_CAPS placeholders with meaningful values for the project. For the purposes of this exercise, the project includes code for modeling the co-occurrence of Vampires and Werewolves on Mars. The project team is actively developing the code. Eventually, they will release a USGS software information product in the public domain. This particular metadata object will document the entire project as opposed to a single product, so use “main” as the version. The project uses machine learning / artificial intelligence techniques and the code is written in Python.

GROUP_HIERARCHY is the group name under which your project is nested in GitLab. The GROUP_HIERARCHY may be one level if you are working out of a personal space (e.g., vdracular) or it may be a nested hierarchy (e.g., ecosystems/FRESC).

Below are the field definitions for code.json and examples of how the template can be updated:

  • name: Should be a short, human readable name for the project. This should match the value provided when creating the project in GitLab. The best practice is to use lowercase words with hyphens separating them.

JSON

"name": "vampires-and-werewolves"
  • organization: Must always be "U.S. Geological Survey"; casing and punctuation are important. No updates are needed to the template.

JSON

"organization": "U.S. Geological Survey"
  • description: This may be a longer description of the project. It should be no more than 1-2 sentences. Verbose descriptions may exist in the README.md file.

JSON

"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars."
  • version: This should be a semantic version number for the product (e.g., 1.0.0) or the DEFAULT_BRANCH name (e.g., main or master) depending on whether the metadata object is referencing the project or an information product. The version number should not include a leading v (i.e., v1.0.0) or other identifier. A Git branch (release candidate branch) must exist with the same name (e.g., 1.0.0) during the review process. Upon publication, the version branch is converted to a tag. (We will discuss more about release tags in a future episode).

JSON

"version": "main"
  • status: Must be one of the enumerated values listed below. There are no official definitions for these terms in code.gov; however, Wikipedia provides some good definitions, which are paraphrased below.
    • Ideation: planning phase of a software project.
    • Development: work on software project prior to formal testing.
    • Alpha: initial testing phase, often done within the project team or organization.
    • Beta: feature complete testing phase that follows Alpha testing, often available to users outside project team or organization.
    • Release Candidate: a Beta version with the potential to be ready for production. In USGS, a release candidate would be going through formal review and approval.
    • Production: the product has passed all stages of testing. In USGS, a production release has been reviewed and approved.
    • Archival: a version of the software that is no longer supported.

JSON

"status": "Development"
  • permissions
    • usageType: A list of enumerated values which describes the usage permissions for the release:
      1. openSource: Open source
      2. governmentWideReuse: Government-wide reuse
      3. exemptByLaw: The sharing of the source code is restricted by law or regulation, including—but not limited to—patent or intellectual property law, the Export Asset Regulations, the International Traffic in Arms Regulation, and the Federal laws and regulations governing classified information
      4. exemptByNationalSecurity: The sharing of the source code would create an identifiable risk to the detriment of national security, confidentiality of Government information, or individual privacy
      5. exemptByAgencySystem: The sharing of the source code would create an identifiable risk to the stability, security, or integrity of the agency’s systems or personnel
      6. exemptByAgencyMission: The sharing of the source code would create an identifiable risk to agency mission, programs, or operations
      7. exemptByCIO: The CIO believes it is in the national interest to exempt sharing the source code
      8. exemptByPolicyDate: The release was created prior to the M-16-21 policy (August 8, 2016)
    • license
      • name: The name of the license under which the product is released (e.g., Public Domain, CC0-1.0). In most cases, the appropriate license for USGS products is Public Domain, CC0-1.0, but sometimes (e.g., when some of the code is from outside sources or collaborators) different licenses are required. For more information on selecting an appropriate license see the Licensing episode in this Lesson.
      • URL: A link to the LICENSE.md file stored in this project
        • Must reference the main or master branch (this will differ for an official product, which should point to the immutable tagged version)
        • Must use the raw variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text. To get the raw variant of a file, click into the file, and click the Open raw button next to the Download button: Screenshot of red circle around a button that will Open Raw version of a file

JSON

"permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    }
  • homepageURL*: A link to the project homepage
    • May point to the project on GitLab, but will not include the .git extension
    • May point to a project home page elsewhere as long as it is publicly accessible (or soon-to-be publicly accessible, once you have gone through the release process) and in an approved location (e.g., usgs.gov webpage as opposed to a personal website)

JSON

"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves"
  • downloadURL: A link to download a ZIP archive of the project source code
    • Must point to the main or master branch (this will differ for an official product, which should point to the immutable tagged version)
    • In GitLab, you can get the download URL by selecting Code–> right click zip (under Download source code) –> Copy Link: Screenshot of red circle showing where to click Code, zip, and Copy link buttons to get the download url

JSON

"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip"
  • disclaimerURL: A link to the DISCLAIMER.md file stored in this project
    • Must use the raw variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text
    • Must point to the main or master branch (this will differ for an official product, which should point to the immutable tagged version)

JSON

"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md"
  • repositoryURL*: A link to this project on GitLab
    • Must include the .git extension

*Note: homepageURL and repositoryURL are different. repositoryURL should end with .git whereas the homepageURL should not.

JSON

"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git"
  • vcs: A lowercase string with the name of the version control system that is being used. For USGS, this will be git. No updates are needed to the template.

JSON

"vcs": "git"
  • laborHours: An estimate of total labor hours spent by your organization across the current version and all previous versions, including labor performed by federal employees and contractors. Labor hours are cumulative across all versions. Your best guess is fine. If not known, the recommendation is to use -1.

JSON

"laborHours": 0
  • tags: An array of topical/domain tags relevant to the project
    • Consider using the USGS Thesaurus or other controlled vocabularies to improve browse functionality in the code inventory.
    • These tags can be used to help people narrow down searches for software, so consider terms that will help direct potential users to your project
    • If the project supports AI/ML research and development, this array must include the tag usg-artificial-intelligence. This tag is short for U.S. Government Artificial Intelligence (i.e., do not use “usgs-artificial-intelligence”).

JSON

"tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ]
  • languages: An array of the programming languages used within this project (e.g., “Python”, “R”, “C++”). There is not a controlled vocabulary, so use your best judgement on how to represent the programming languages in your project.

JSON

"languages": [
      "Python"
    ]
  • contact: Point of contact information for the software information product.

JSON

"contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    }
  • date
    • metadataLastUpdated: An ISO datestamp (YYYY-MM-DD) of when the metadata item within the code.json file was last modified. Be sure to update this value whenever you modify any of the other key/value pairs for this metadata item. Note that you must use two digits for month and day (e.g., 2024-8-9 is not correct).

JSON

"date": {
      "metadataLastUpdated": "2024-05-29"
    }

Personal Space in GitLab

In the examples above, the URLs that we are generating reference Vlad Dracula’s or your own personal GitLab space. In reality, you cannot make a repository public that is located under a personal username. Instead, public repositories need to be located under a public group. The current recommendation is to have groups at the USGS Mission Area level (e.g., Ecosystems) and then subgroups at the USGS Science Center level. Project repositories will then be located within the Science Center subgroup. To avoid needing to rename all of your URLs, it is a best practice to start projects within these public groups and maintain more restrictive permissions at the project level.

This is what the full code.json file should look like after making the updates above:

JSON

[
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "main",
    "status": "Development",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 0,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-05-29"
    }
  }
]

Challenge

Use JSON Formatter & Validator to format and check your JSON. What errors were present in your JSON? Note that this tool only validates against the JSON syntax and does not validate against the code.gov metadata schema.

Additional code.json fields

Additional fields are also available. See the official code.gov metadata schema for additional details. Note that you should only add fields from the “releases” array within this schema. The full code.gov metadata schema includes other fields that are necessary for building the Enterprise Code Inventory, but those should not be included in the individual project code.json files. Fields that are not documented in the official code.gov metadata schema cannot be included in the code.json files.

Updating Metadata for Initial Software Information Product


Remember that the top-level element in code.json file is an array. This means it may contain more than one object for your project. The recommended practice is to order metadata objects with the DEFAULT_BRANCH (e.g., main) appearing first, followed by the most recently released version. For an initial software information product release, it would look something like this:

JS

[
 {
 // ... main, status Development
 },
 {
 // ... release 1.0.0, status Production
 }
]

Metadata evolve over time. There is some confusion where people think the metadata in the main branch should be for the main branch code only and not for any other branches. The reality is the metadata in the DEFAULT_BRANCH (e.g., main) should contain metadata for each version of the project (official or otherwise). The metadata in the tags associated with a specific version should contain metadata for the current version and all preceding versions; in this way, it will match the metadata in the main branch at the time the version is created.

Releasing an Initial Software Information Product

You are ready to release an initial version of your software information product. In the code.json file, copy the text for the main branch’s release object and paste it directly below in the code.json array (you will use the main branch release object as a type of template to make further changes). You will need to add a comma between the two objects after the closing } for the first object. In the second object, update the status field to Production. Additionally, update the URL fields in the second object to use 1.0.0 (or whatever version number you are using; it is not required to use 1.0.0) instead of main in the RELEASE_VERSION section of the URL. You will also need to update the laborHours and the metadataLastUpdated fields.

JSON

[
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "main",
    "status": "Development",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 200,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-06-15"
    }
  },
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "1.0.0",
    "status": "Production",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/1.0.0/vampires-and-werewolves-1.0.0.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 200,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-06-15"
    }
  }
]

The version of the code.json file that was created in the exercise above will be included in the 1.0.0 branch, once the branch is created, and ultimately the immutable tagged product, as well as in the main branch.

Note about Status Field

There are no set rules for what status needs to be assigned to a given version or branch of a project. The goal is to do the best to communicate to users how thoroughly particular code has been tested, reviewed, and approved, and how you might anticipate them using the project and products. For example, if you have testing, reviews, and approvals built into your development process such that the main branch is always the latest and greatest and should be the go-to code to use, then the main branch might be labeled with a status of ‘Production’. If instead the content in the main branch is not formally approved until a release branch is created, then the main branch might maintain a status of ‘Development’ to encourage users to use the most recent formal version.

Likewise, if a previous version of a product is still relevant and usable, it may continue to have a status label of ‘Production’. If, however, the newer version corrects some bugs and should be used instead of a previous version, then, the previous version should have its status updated to ‘Archival’.

Key Points

  • A code.json file is a file formatted in JavaScript Object Notation (JSON) and contains project metadata. The code.json file is saved at the top-level of the project.
  • USGS compiles all of the code.json files for public products in GitLab into an inventory that is required by Federal policy.
  • You can use the code.json file template above to begin creating your project and product metadata with the required fields.

Content from Software Review for Authors


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I prepare my code for a software review?
  • What information do I need to provide to my reviewer(s)?
  • How should I reconcile reviewer comments?
  • How can I document the review to meet Fundamental Science Practices requirements?

Objectives

  • Create a release candidate branch in preparation for a software review.
  • Update the DISCLAIMER.md to the approved language.
  • Develop a GitLab issue to facilitate the software review.
  • Reconcile reviewer comments by updating and rebasing code.

Software Review Overview


After you have completed your code development, added the required files to your repository, and are ready to publish your software product, you will need to request a review. As noted in the Policy episode, official USGS software information products must be reviewed and approved before they can be released. The review must include an administrative, code, and domain review. We will go into more depth on how to conduct these reviews in the next episode Software Review for Reviewers.

Release Candidate Branch


A release candidate branch should be created to help facilitate the review process. The release candidate branch should have the same name as the eventual release tag. For example, if you are preparing to release version 1.0.0 of your software product, the release candidate branch will be 1.0.0.

Open Git Bash (Windows) or Terminal (MacOS) and navigate to your local repository. Make sure you are in your main branch and it is up to date. We will use --ff-only to reject the pull request if there are any local commits that are not already on the remote.

BASH

git switch main
git pull --ff-only origin main

Create a release candidate branch

BASH

git switch -c 1.0.0

Push the release candidate branch to the remote

BASH

git push -u origin 1.0.0

Disclaimer


Once you have created the release candidate branch, you should update the DISCLAIMER.md from provisional to approved in the release candidate branch only. In the main branch, this file can continue to reflect the provisional wording. Language for approved software can be found in section 5 of the FSP Guidance on Disclaimer Statements Allowed in USGS Science Information Products

Please note that the release candidate branch should not be merged back in the main branch at any point. During the review process and up until publication, changes can be promoted from the main branch into the release candidate branch via a rebase (described below).

GitLab Issue for Review


There is not a single prescribed way to perform a software review. In this lesson, we offer an example for how you can help a reviewer structure and document their review.

Create a GitLab Review Issue

Screenshot of the Issues menu item and Issues page in GitLab

Example review issue templates are available: - Traditional Software Product Review Template - Scientific Software Product Review Template (e.g., code repository used in the publication of a report or manuscript that addresses a research question).

Include the following information in your issue:

  • a brief description of the project and the repository structure
  • special instructions for the reviewer
  • a checklist of elements to review

The checklist provided in the review issue template linked above may have checks that are not applicable to your project. Help your reviewer by only including checks that are relevant to your project. The next episode will outline the review process from the perspective of the reviewer.

Create a Review Issue

  1. Create an issue in your repository to help your reviewer(s) structure their review. You may choose to setup issue templates in your repository and include a review template as demonstrated within this project and documented on the GitLab Description Templates page. Alternatively, you can just copy the markdown from the example template linked above and paste it into a new issue.

  2. Think about a project that you have worked on in the past. With a partner or small group, discuss the following questions:

    1. Which parts of the review template would you keep for the project?
    2. What elements or checks would you add?
    3. Are there any parts of the template that you do not understand?

Reconcile Review


Once the reviewer(s) completes the review, you will need to reconcile the review. Create a separate branch or multiple branches to make any necessary code changes based on the reviewer’s feedback.

Merge the changes into your main branch and then rebase into your release candidate branch. Rebasing is the process of adding a series of commits to a new base commit, which provides a cleaner git commit history than merging (click here to learn more about when to rebase instead of merge). It will look like this:

BASH

git switch main
git pull --f-only origin main
git switch 1.0.0
git rebase main
git push origin 1.0.0

You can reference the commit hash for each change that you made in response to the reviewer’s comments and then resolve the comment thread.

Callout

Note that the referenced commit hash must be pushed to GitLab before you submit the comment, otherwise GitLab will not link the hash and it will look like gibberish text in the comment.

Once all changes are made and your release candidate branch is up to date, close the review issue and state the commit hash on which it ended.

Print the final review issue to PDF and upload the PDF as the review and reconciliation documentation for the USGS Information Product Data System.

Callout

You may want to include a link to the original issue in the review artifact that is loaded into the USGS Information Product Data System. Please remember that the workflow presented in this and the next episode is just one of many ways to complete a software review and reconciliation. Please check with your Science Center leadership to see if they have different requirements.

Key Points

  • A release candidate branch is named with the version number for the anticipated software information product release
  • Once you have a release candidate branch, update the DISCLAIMER.md
  • A GitLab review issue provides structure for reviewers and makes it easier for them to conduct a review
  • A PDF of the final review issue can serve as the review and reconciliation documentation in the USGS Information Product Data System

Content from Software Review for Reviewers


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • What are my responsibilities as a reviewer of software?
  • How do I conduct a software review?

Objectives

  • Explain the topics that need to be covered during review.
  • Conduct a software review.

Software Review Overview


All USGS open-source software projects must undergo an administrative review. Official USGS software products must undergo two additional forms of review: Technical code review and scientific (domain) review. For an official overview, see Types of Software Review.

Administrative Review

The administrative reviewer’s duty is to make sure that the entire history of the project is free of potential security or privacy violations. There are several types of information which must not be present in code released to the public:

  • Personally identifiable information (PII)
  • Absolute file system paths
  • Internal server host names or IP addresses
  • Usernames or passwords

This review must be done for every commit in the released software. This can be a very onerous requirement if it is to be done all at once. For that reason, collaborative workflows where changes to the codebase go through merge requests are a very good way of making sure that the administrative review has been adequately done.

It is acceptable for team members to review each others’ contributions, even if they are both listed as authors of the software. Reviewing a merge request is done by people who are not authors on that specific code; they are only “authors” of the project generally. Therefore, mutual in-team reviews are a convenient way to comply with this requirement.

Technical code review

The technical code review focuses on such concerns as adherence to coding standards and other measures of code quality. This review is required for all official USGS software products, but not for provisional products. Unlike administrative review, this does not need to be done for every commit of the released software.

Typical focuses in technical code review include

  • checking for adherence to explicit coding standards, such as conventions for naming variables and functions
  • ensuring that unit tests pass
  • inspecting for vulnerabilities or bugs

Some of these areas of concern are amenable to automation. For instance, linter software can test for adherence to coding standards, calculate measures of code complexity, and identify common bug-prone patterns.

Scientific (domain) review

Scientific software requires a domain review as well. Like the technical code review, the domain review only needs to be done on the end product, not on individual commits. What constitutes an appropriate domain review will vary a great deal depending on the domain, but generally involves checking for scientific flaws or errors. It is similar to a peer review for a scientific publication: checking that the methods are applied correctly and are appropriate for the scientific question. You can leverage community resources, such as CDI or your local colleagues, for insight into scientific reviews in your domain.

How To Make The Review Streamlined


The person requesting the review may have already set up a way for you to do the review. For instance, if following the instructions in Review for Authors, they will have created a GitLab issue where you can conduct your review.

There are many possible ways to do reviews, including methods such as Word documents which do not involve git at all. But if the reviewer has not specified how to conduct the review, one good way to do it is to create your own GitLab issue. Using either the issue title or a label, make clear what kind of review you are doing, and what version or tag of the software it pertains to. Now you can put your comments, requests for changes, or approval into this issue. This keeps everything tied into the code repository, rather than in email or Teams chats, where it could get lost.

Adding Comments to Your Review

Let us try adding a comment to a review issue as a reviewer and as an author.

Reviewer Role:

  • Navigate to Code -> Commits Screenshot of the menu to use to navigate to your Commits
  • Select release branch 1.0.0 Screenshot of the menu to use to navigate to your 1.0.0 branch commits
  • Copy commit SHA Screenshot of the `Copy commit SHA` button
  • Start new comment in review issue: “Starting review as of commit [paste commit SHA]
  • Click Comment Screenshot of a comment on the review issue with the commit SHA pasted
  • In the example exercise here, you start your review with the metadata file first and note that a date needs to be updated. Navigate to code.json file in branch 1.0.0
  • Right click on the line number next to ‘metadataLastUpdated’ and select Copy Link Screenshot of right-clicking on a line of code and selecting `Copy Link`
  • In a new comment, paste link into review issue and add a comment (e.g., “Make sure to update the metadataLastUpdated date before you submit for publication”)
  • Select the dropdown next to Comment and choose Start thread and then click the button Start thread Screenshot of using the dropdown menu next to `Comment` to `Start thread`

Author Role:

For reconciling a review, you can create a branch to address all comments. In this exercise, we just have one comment to address:

  • Create a feature branch to address the comment

BASH

  git switch -c review-recon-1.0.0
  • Open editor and make the change

BASH

nano code.json
  • Stage, commit, and push the changes

BASH

git add code.json
git commit -m "Update metadataLastUpdated date per review comment"
git push -u origin review-recon-1.0.0
  • Create merge request and merge to main in GitLab
  • Rebase changes into your release candidate branch

BASH

git switch main
git pull --ff-only origin main
git switch 1.0.0
git rebase main
git push origin 1.0.0
  • Copy commit SHA (Note: you should always copy the commit SHA after rebasing your release candidate branch)
  • Reply to the comment in the review issue with the commit SHA for the commit in which the comment was addressed Screenshot of checking the `Resolve thread` checkbox and clicking `Reply` in an issue thread

Key Points

  • An administrative review is required for all open-source software projects
  • A technical code review and a scientific domain review are required for official USGS software products
  • There are many ways to conduct and document a software review. One way is by creating a GitLab issue with comments documenting the review

Content from Publishing


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I get my Git repository published once it is ready and has been approved?
  • What static objects should I create in Git for the final release?
  • How do I make the DOI point to the correct Git object?

Objectives

  • Create a ticket to request publication of your software.
  • Create a tag and a “release” in your Git repo.
  • Publish a DOI.

Publishing Software Overview


Congratulations! You have added all the required files, completed the software reviews, and are ready to publish the software in your Git repository!

Checklist

As a reminder, at this point you should have an appropriate version of the following files:

  • LICENSE.md
  • DISCLAIMER.md
  • README.md
  • code.json
  • optional: CITATION.md

And completed the following tasks:

  • Created a DOI
  • Completed the review process
  • Obtained IPDS approval
  • Started a release candidate branch

Initiate a Request to Publish your Code


After double checking the above list and reviewing the software release checklist, navigate to the USGS GitLab Software Management repository using a link provided by your instructor. To initiate a request, open an issue on this project and in the fields seen below, add a descriptive title and select the “GitLab Official Release” template under the “Description” field:

Screenshot of Issue Template for Publishing a Git repository

Selecting “GitLab Official Release” will pre-populate the text box with a template that includes sections for you to fill out. If you have followed along so far, you should have all the information requested. Update the template text with information relevant to your request:

For fields requesting textual input, examples are provided between backticks `e.g. example`; replace the content between the backticks with your answer.

Fields with a checkbox (a space between two square brackets) are asking you to acknowledge or agree to associated text; replace the space between the brackets with an x to indicate you agree/acknowledge.

Do not edit the /label lines as they may delay notification/processing of your request.

When done, click “Create Issue” at the bottom.

Callout

The template may ask for the username of the approving official. If they have a GitLab username, you can tag them using the @ symbol: @vdracula. If they do not, you can write their email address instead: vdracula@usgs.gov.

Once an administrator sees the issue, they run an automated final validation tool to provide feedback on errors. That is why you will see this note at the bottom of their message:

Screenshot of the note that explains a comment was automatically generated

Often, the feedback concerns errors from the code.json file not containing the correct urls. Reviewing the Creating Metadata lesson may help clarify how to fix these errors.

Discussion

As part of this course, you will not be submitting your Vampire and Wolfman project for publication, but you can take a look at current Git projects that have been submitted.

Navigate to the Software Management Issues page using the link provided by your instructor. Do you see any requests for publishing software? Click on a few and see how they filled out the template. What responses did they receive? What did they need to edit before publication? Do the requested edits align with what you’ve learned so far? How would you fix the errors?

Create a Git Tag


Once you have corrected all errors and received approval via the Git Issue, your next step is to create a static Git tag and delete the release candidate branch. A tag is a human readable name that points to a specific commit ID and does not change with subsequent updates or commits. Because of this stability, it is used for the official version of the software.

To create a tag, navigate to the left-hand menu and select “Tags” under “Code”, then click “New Tag”:

Screenshot of creating a new Git tag

Then, fill out the tag information with the tag name as the version name, select the release candidate branch (which should have the same name), and write a brief description:

Screenshot of New Tag page

Create a Release from the Tag


On the next page, create a release from the tag. This release will be used to activate the DOI, i.e., the DOI will point to the release (not the tag, the release candidate branch, or the main branch).

Screenshot of where to click "Create Release"

Add a title for the release, which can be the same as the tag and version number. There is a Description box for any notes you may want to add, which you can edit at any point. For example, if you publish an updated version of the repository, you may want to come back, and redirect users to the most up-to-date version.

Screenshot of page used to create a Git release

Once done, click “Create Release”. Then use the url to activate the DOI. The url should be in this format: https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/releases/RELEASE-NAME

Now that you have the static tag and release, delete the release candidate branch by navigating to “Branches” under “Code” on the left-hand menu, then click the three vertical dots on the release candidate branch, and click “Delete branch”:

Screenshot of deleting a Git branch

Why do we create a Release?

Why is it preferable to point the DOI to a release rather than a tag or branch? You cannot edit a tag or the release branch, but you can add notes or updates to a release. These notes may be useful to the user in the case that there is a more updated version or other information you wish to share.

Create a Git tag and associated release

Follow the above instructions to create a tag and release in your Vampires and Werewolves repository.

Activate your DOI


Use the USGS Asset Identifier Service to manage the DOI you created in the citation lesson. Before you can activate the DOI, you will need to include the creators, publication year, URL to the Release page, IPDS number, and related publication (if applicable). If you would like your scientific software information product to be displayed on the USGS website, you will also need to include a brief description of the product in the DOI. Once you have filled in this required information, the “Publish Approved Release to DataCite” button on the left-hand menu will become active and you will be able to click on it:

Screenshot of DOI activation menu

Note: As part of this course we are not creating and publishing DOIs

Disseminate in IPDS


Your last step, as with any USGS product, is to Disseminate the record in IPDS. Follow your Center’s policy on how to disseminate the product.

Key Points

  • Submit an new issue as the first step in publishing a software product
  • Create an static Git tag and associated release
  • Activate your DOI using the url of the Git release

Content from Continuing Your Project


Last updated on 2025-05-22 | Edit this page

Overview

Questions

  • How do I follow policy when developing an open-source project?
  • When should I release updated versions of my project?
  • How do I prepare my project for subsequent releases?

Objectives

  • Continue development on your open-source project.
  • Release subsequent versions of your open-source project.

In many cases, work must continue on a project after it becomes publicly accessible. This may be following an official USGS Software Information Product release, or following a more informal open-source release process. In any case, the USGS supports open-source project development with some conditions.

Continue Open-Source Project Development


When developing an open-source project, all modifications must receive, at minimum, one administrative security review before being incorporated into the open-source project. This review must ensure no sensitive or personally identifiable information is exposed by incorporating these changes.

There are workflows supporting this review process. Previously in this course, we introduced a branching workflow, which must be modified in order to align with policy during open-source development. One modified workflow that aligns with policy requirements is called a “Forking Workflow”.

Forking Workflow

With a forking workflow, each developer on the project creates a private personal copy, or fork, of the shared open-source (public) project. This fork is often referred to as the developer’s “origin” and the shared open-source project is often referred to as the “upstream”.

A forking workflow is also beneficial because it removes barriers to new collaborator contributions. Rather than needing to individually grant access to each potential collaborator, anyone can fork the open-source project and submit a merge request to contribute.

What is in a name?

The terms “origin” and “upstream” are conventions within the broader software development community for referencing the remote repository locations. These could be called anything, but following the convention improves shared understanding across development teams.

To view all your remote locations and their aliases using the command line, try

BASH

git remote -v

The forking workflow is similar to the branching workflow except the branches are created within the developer’s origin and the merge requests are from the developer’s origin to the shared upstream repositories. Let us see how this works.

Diagram showing an Upstream and Origin as part of USGS Gitlab, and a Local Clone as part of the the Local Workstation. Arrows go back and forth between Upstream and Origin and between Origin and Local Clone. A dashed line goes between the Local Clone and Upstream.
Forking Workflow Diagram

In the diagram above we see an upstream and origin location within the USGS GitLab platform. Within the developer’s local workstation we see a local clone where the developer will work. A high-level overview of the workflow is as follows:

  1. Developer creates a personal fork called an origin
  2. Developer configures their fork to their local workstation
  3. Developer continues project development on branches within the local clone
  4. Developer pushes completed branches from their local clone to their origin
  5. Developer submits a merge request from the branch in their origin to the default branch in the upstream. A maintainer reviews and optionally merges the changes.

1. Create a fork

Creating a developer fork is a one-time process for each developer. The developer will fork the upstream repository to create their origin repository. This is completed within the GitLab interface by navigating to the upstream location and clicking the “Fork” button in the upper right area of the page.

Screenshot of GitLab UI showing location of fork button

It is important to click the “Fork” text and not the number to the right of the “Fork” text as these have different effects. On the next screen the developer must provide some information about their fork and then click the “Fork project” button near the bottom.

Screenshot of GitLab UI for creating a fork

Primarily, the developer must “Select a namespace” where the fork will be created. Typically they would select their personal user namespace. It is uncommon to change the project name, project slug, or project description. Typically all branches should be included in the fork and the visibility can be either “Private” or “Internal”; however, “Public” will be disabled.

Visibility Matters

Personal forks are not allowed to be made publicly accessible. Only the shared upstream project location may be publicly accessible. However, when the fork has a more restrictive visibility than the upstream, GitLab often makes incorrect default assumptions when the developer subsequently creates merge requests. GitLab will assume the merge request is from the developer fork and to the developer fork, which is incorrect. For this reason, it is important to pay attention when creating the merge request later.

2. Configure local clone

The local clone may be configured in one of two different ways. If the developer had previously cloned the repository from what is now called the upstream, we can rename the existing remote to be called “upstream” and then add a new remote called “origin”. Alternatively, if the developer does not yet have a local clone of the project, they can clone their origin and add an “upstream”. The end result is the same.

The ORIGIN_URL and UPSTREAM_URL values may be copied from the GitLab web interface by navigating to the corresponding project page, selecting the “Code” drop down option and then clicking the copy icon for the “Clone with HTTPS” option.

Screenshot of GitLab UI for obtaining ORIGIN_URL

3. Continue project development

Within your local clone and personal origin, development continues following the branching workflow as described in the previous “Branching and Merging” episode. The developer creates different branches for each logical group of changes and commits them locally.

4. Push completed branches

When local development work is ready for integration, the developer pushes their local branch to their developer origin. If the developer previously pushed with the -u or --set-upstream-to flags as described in the “Branching and Merging” episode, it is important to reset these now since the “origin” is pointing to a new location. More simply, you may always explicitly specify what is pushed to where using:

BASH

git push origin 1-my-first-issue

Callout

In the above command, 1-my-first-issue is the name of the branch that is pushed and origin is the remote destination to where that branch is pushed.

5. Integrate changes

The developer should open a merge request from the development branch in their origin repository to the upstream default branch (e.g., main). To do this, first navigate a web browser to the developer origin project page on USGS GitLab. Then, select “Code” and “Merge requests” from the navigation menu on the left. Next click the “New merge request” button.

Screenshot of GitLab UI for creating a new merge request

On the next screen, select the correct “Source branch” and “Target branch” information and then click “Compare branches and continue”.

Screenshot of GitLab UI for finalizing new merge request

In the “Source branch”, the developer fork location should be selected in the first drop down box. This should be the default if opening a merge request from the developer fork project page. The second drop down box in this section does not default to anything and the desired development branch should be selected.

In the target branch, it is important the correct upstream location is selected. In the screenshot, the “mlangseth” location is selected as the upstream. The default branch in the selected target location will be selected by default, this is typically correct but may be different for specific development teams.

Visibility (still) matters

If the visibility of the origin and upstream match, GitLab will select the correct values for the source and target repository locations. In general, this will not be the case following this open-source continuing development guide. It is for this reason you must carefully select the correct repository locations when on this screen.

On the final screen, you are given the option to provide a custom merge request title, description, labels, assignments, etc. Complete these choices appropriately and click the “Create merge request” button at the bottom to create the final merge request.

This new merge request can now be reviewed, commented on, reconciled, and integrated in the same manner as was described in the previous “Branching and Merging” episode.

Subsequent Releases


Following some amount of development on the open source project, it may become appropriate and/or necessary to release a new version of the software project as a new official USGS software information product. The new version of the project is subject to the same review and approval requirements as if it were the first or only release of the project. A new Information Product Data System (IPDS) record, a new digital object identifier (DOI), and updated metadata (code.json), are all required.

Triggering a subsequent release

When may a subsequent version of the software project be released as a new official USGS software information product?

When must a subsequent version of the software project be released as a new official USGS software information product?

In general, the triggering criteria for a subsequent release of a software project as an official USGS software information product are the same as for the original release of the software project.

A subsequent version of the software development project may be released as a new official USGS software information product at the author’s discretion.

A subsequent version of the software development project must be released as a new official USGS software information product if this new version is desired to be cited and/or results thereof are intended to be used to support some other official USGS information product.

Preparing Metadata

For releasing subsequent software information products, modify the code.json file in the main branch. Update the status field for the previous version to Archival, if applicable. Multiple versions may be in Production at once.

Copy the text from the previously released object in the code.json and paste it between the main branch object and the previously released object (still within the array []). Add a comma after the closing bracket (}) for the object to separate it from the previous product.

Update the version, status, permissions.license.URL, downloadURL, disclaimerURL, and laborHours in this object to document the newest version. Additionally, update the metadataLastUpdated for any metadata objects that have been modified, including the metadata object for this newest version.

Remember from the Creating Metadata episode that the top-level element in a code.json file is an array. If a project has been under development for a long time, there may be multiple released versions. In this case, objects should be ordered with the DEFAULT_BRANCH (e.g., main) appearing first, followed by the most recently released version, and so-on in reverse chronological order. For example:

JSON

[
  {
    // ... main (DEFAULT_BRANCH), status Development
  },
  {
    // ... release 3.0.0, status Production
  },
  {
    // ... release 2.0.0, status Archival
  },
  {
    // ... release 1.0.0, status Archival
  }
]

In the hypothetical example code.json file above, the release tag for version 1.0.0 would only include metadata for that product (in addition to the DEFAULT_BRANCH metadata) and it would likely have a status of Production. Once you release version 2.0.0, three objects would exist in the array, first would be the DEFAULT_BRANCH metadata with a status of Development, next 2.0.0 with status Production and third would appear 1.0.0 with status Archival. However, because we never go back and edit released tags, you would not change the code.json file in the 1.0.0 tagged version, and it would still specify that version as Production. However, in the main branch, the code.json file must be updated to include new software information products. The code.json file may include metadata objects marking other milestone tagged versions in addition to those associated with official USGS software information products.

Update the code.json File for Subsequent Release

Update the code.json file within the main branch to prepare to release version 2.0.0. What fields did you need to update? How many objects are now in your JSON array? Did you need to change anything in the version 1.0.0 object? What about the main object?

JSON

[
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "main",
    "status": "Development",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 0,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-06-15"
    }
  },
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "2.0.0",
    "status": "Production",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/2.0.0/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/2.0.0/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/2.0.0/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 300,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-07-01"
    }
  },
  {
    "name": "vampires-and-werewolves",
    "organization": "U.S. Geological Survey",
    "description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
    "version": "1.0.0",
    "status": "Archival",

    "permissions": {
      "usageType": "openSource",
      "licenses": [
        {
          "name": "Public Domain, CC0-1.0",
          "URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/LICENSE.md"
        }
      ]
    },

    "homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
    "downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/1.0.0/vampires-and-werewolves-main.zip",
    "disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/DISCLAIMER.md",
    "repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
    "vcs": "git",

    "laborHours": 200,

    "tags": [
      "usg-artificial-intelligence",
      "vampires",
      "werewolves",
      "mars"
    ],

    "languages": [
      "Python"
    ],

    "contact": {
      "name": "Vlad Dracula",
      "email": "vdracula@usgs.gov"
    },

    "date": {
      "metadataLastUpdated": "2024-07-01"
    }
  }
]

The 2.0.0 object was added between the main and 1.0.0 release objects. The following fields were updated for the 2.0.0 object: version, status, permissions.license.URL, downloadURL, disclaimerURL, metadataLastUpdated, and laborHours. There are now 3 objects in the code.json array. The status and the metadataLastUpdated fields were updated in the 1.0.0 object. Nothing was updated in the main object.

Key Points

  • A good workflow can streamline open-source project development while ensuring compliance with governing policies
  • While specific criteria necessitate releasing subsequent versions, this may also be done at the author’s discretion
  • Subsequent versions are released in a manner very similar to the initial version
  • The code.json file should be updated to include another object within the array that describes the new version.