Content from Automated Version Control
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What is version control and why should I use it?
Objectives
- Explain the benefits of an automated version control system.
- Explain the basics of how automated version control systems work.
We’ll start by exploring how version control can be used to keep track of what one person did and when. Even if you aren’t collaborating with other people, automated version control is much better than trying to figure out which of the following is your most recent version:
- GrantReport_Final.docx
- GrantReport_Final-SupervisoryReview.docx
- GrantReport_ReviewWithChanges.docx
- GrantReport_Finalv3.docx
- GrantReport_Final_for_Review.docx
We’ve all been in this situation before: it seems unnecessary to have multiple nearly-identical versions of the same document. Some word processors let us deal with this a little better, such as Microsoft Word’s Track Changes, Google Docs’ version history, or LibreOffice’s Recording and Displaying Changes.
Version control systems start with a base version of the document and then record changes you make each step of the way. You can think of it as a recording of your progress: you can rewind to start at the base document and play back each change you made, eventually arriving at your more recent version.
Once you think of changes as separate from the document itself, you can then think about “playing back” different sets of changes on the base document, ultimately resulting in different versions of that document. For example, two users can make independent sets of changes on the same document.
Unless multiple users make changes to the same section of the document - a conflict - you can incorporate two sets of changes into the same base document.
A version control system is a tool that keeps track of these changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a commit), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a repository. Repositories can be kept in sync across different computers, facilitating collaboration among different people.
The Long History of Version Control Systems
Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities. More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.
Paper Writing
Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve the excellent version of your conclusion? Is it even possible?
Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper? If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the
Track Changes
option? Do you have a history of those changes?
Recovering the excellent version is only possible if you created a copy of the old version of the paper.
Collaborative writing with traditional word processors is cumbersome. Either every collaborator has to work on a document sequentially (slowing down the process of writing), or you have to send out a version to all collaborators and manually merge their comments into your document. The ‘track changes’ or ‘record changes’ option can highlight changes for you and simplifies merging, but as soon as you accept changes you will lose their history. You will then no longer know who suggested that change, why it was suggested, or when it was merged into the rest of the document. Even online word processors like Google Docs or Microsoft Office Online do not fully resolve these problems.
Key Points
- Version control is like an unlimited ‘undo’.
- Version control also allows many people to work in parallel.
Content from Setting Up Git
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I get set up to use Git?
Objectives
- Configure Git the first time it is used on a computer.
- Explain the meaning of the
--global
configuration flag.
When we use Git on a new computer for the first time, we need to configure a few things. Below are some configurations we will set as we get started with Git:
- our name and email address,
- what our preferred text editor is,
- and that we want to use these settings globally (i.e. for every project).
On a command line, Git commands are written as
git verb options
, where verb
is what we
actually want to do and options
is additional information
which may be needed for the verb
. So here is how Dracula
sets up his new laptop:
BASH
$ git config --global user.name "Vlad Dracula"
$ git config --global user.email "vdracula@usgs.gov"
Please use your own name and email address instead of Dracula’s. This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to GitHub, BitBucket, GitLab or another Git host server after this lesson will include this information.
For this lesson, we will be interacting with GitLab and so the email address used should be your USGS email.
Line Endings
As with other keys, when you hit Enter or ↵ (or, on Macs Return), your computer encodes this input as a character (or two). Different operating systems use different character(s) to represent the end of a line. Windows uses the combination of the carriage return and linefeed characters and Unix and Mac use only linefeed. These can cause otherwise identical files to look different to Git. The solution is to automatically strip the carriage return characters when you move files from Windows to the other systems and add them back when you move files in the other direction. You can read more about this issue in the Pro Git book.
You can change the way Git recognizes and encodes line endings using
the core.autocrlf
command to git config
. The
following settings are recommended:
On macOS and Linux:
And on Windows:
When Git spots a conflict (discussed later), it will automatically open your editor so you can resolve the conflict. To set your favorite editor, choose one of the following configuration commands:
Editor | Configuration command |
---|---|
Atom | $ git config --global core.editor "atom --wait" |
nano | $ git config --global core.editor "nano -w" |
BBEdit (Mac, with command line tools) | $ git config --global core.editor "bbedit -w" |
Sublime Text (Mac) | $ git config --global core.editor "/Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl -n -w" |
Sublime Text (Win, 32-bit install) | $ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w" |
Sublime Text (Win, 64-bit install) | $ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w" |
Notepad (Win) | $ git config --global core.editor "c:/Windows/System32/notepad.exe" |
Notepad++ (Win, 32-bit install) | $ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin" |
Notepad++ (Win, 64-bit install) | $ git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin" |
Kate (Linux) | $ git config --global core.editor "kate" |
Gedit (Linux) | $ git config --global core.editor "gedit --wait --new-window" |
Scratch (Linux) | $ git config --global core.editor "scratch-text-editor" |
Emacs | $ git config --global core.editor "emacs" |
Vim | $ git config --global core.editor "vim" |
VS Code | $ git config --global core.editor "code --wait" |
It is possible to reconfigure the text editor for Git whenever you want to change it.
Exiting Vim
Note that Vim is the default editor for many programs. If you haven’t
used Vim before and wish to exit a session without saving your changes,
press Esc then type :q!
and hit Enter
or ↵ or on Macs, Return. If you want to save your
changes and quit, press Esc then type :wq
and
hit Enter or ↵ or on Macs, Return.
Git (2.28+) allows configuration of the name of the branch created
when you initialize any new repository. Dracula decides to use that
feature to set it to main
so it matches the cloud service
he will eventually use.
Default Git branch naming
Source file changes are associated with a “branch.” For new learners in this lesson, it’s enough to know that branches exist, and this lesson uses one branch.
By default, Git will create a branch called master
when
you create a new repository with git init
(as explained in
the next Episode). The software development community
has moved to adopt the term main
instead.
In 2020, most Git code hosting services transitioned to using
main
as the default branch. As an example, any new
repository that is opened in GitHub or the USGS GitLab defaults to
main
. However, Git has not yet made the same change. As a
result, local repositories must be manually configured to have the same
default branch name as most cloud services.
The five commands we just ran above only need to be run once: the
flag --global
tells Git to use the settings for every
project, in your user account, on this computer.
Let’s review those settings and test our core.editor
right away:
Let’s close the file without making any additional changes. Remember, since typos in the config file will cause issues, it’s safer to view the configuration with:
And if necessary, change your configuration using the same commands to choose another editor or update your email address. This can be done as many times as you want.
Proxy
Typically, your work in USGS will not require the use of a proxy. In the unusual case that your group requires it, you may also need to tell Git about the proxy:
To disable the proxy, use
Git Help and Manual
Always remember that if you forget the subcommands or options of a
git
command, you can access the relevant list of options
typing git <command> -h
or access the corresponding
Git manual by typing git <command> --help
, e.g.:
While viewing the manual, remember the :
is a prompt
waiting for commands and you can press Q to exit the
manual.
More generally, you can get the list of available git
commands and further resources of the Git manual typing:
There are many development environments that have built-in integrations with Git to streamline the most common Git operations. This lesson does not go into details on using these integrations, but here are some resources that you can explore on your own: - RStudio: https://docs.posit.co/ide/user/ide/guide/tools/version-control.html - Visual Studio Code: https://code.visualstudio.com/docs/sourcecontrol/overview
Key Points
- Use
git config
with the--global
option to configure a user name, email address, editor, and other preferences once per machine.
Content from Creating a Repository
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- Where does Git store information?
Objectives
- Create a local Git repository.
- Describe the purpose of the
.git
directory.
Once Git is configured, we can start using it.
We will continue with the story of Wolfman and Dracula who are modeling the co-occurrences of vampires and werewolves on Mars.
First, let us create a new directory in the Desktop
folder for our work and then change the current working directory to the
newly created one:
Then we tell Git to make vampires-and-werewolves
a repository -- a place where Git can
store versions of our files:
It is important to note that git init
will create a
repository that can include subdirectories and their files—there is no
need to create separate repositories nested within the
vampires-and-werewolves
repository, whether subdirectories
are present from the beginning or added later. Also, note that the
creation of the vampires-and-werewolves
directory and its
initialization as a repository are completely separate processes.
If we use ls
to show the directory’s contents, it
appears that nothing has changed:
But if we add the -a
flag to show everything, we can see
that Git has created a hidden directory within
vampires-and-werewolves
called .git
:
OUTPUT
. .. .git
Git uses this special subdirectory to store all the information about
the project, including the tracked files and sub-directories located
within the project’s directory. If we ever delete the .git
subdirectory, we will lose the project’s history.
Next, we will change the default branch to be called
main
. This might be the default branch depending on your
settings and version of git. See the setup episode for
more information on this change.
We can check that everything is set up correctly by asking Git to tell us the status of our project:
OUTPUT
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
If you are using a different version of git
, the exact
wording of the output might be slightly different.
Places to Create Git Repositories
Along with tracking information about the vampires and werewolves
modeling project on Mars (the project we have already created), Dracula
would also like to track information about vampires and werewolves on
various moons. Despite Wolfman’s concerns, Dracula creates a
moons
project inside his
vampires-and-werewolves
project with the following sequence
of commands:
BASH
$ cd ~/Desktop # return to Desktop directory
$ cd vampires-and-werewolves # go into vampires-and-werewolves directory, which is already a Git repository
$ ls -a # ensure the .git subdirectory is still present in the vampires-and-werewolves directory
$ mkdir moons # make a subdirectory vampires-and-werewolves/moons
$ cd moons # go into moons subdirectory
$ git init # make the moons subdirectory a Git repository
$ ls -a # ensure the .git subdirectory is present indicating we have created a new Git repository
Is the git init
command, run inside the
moons
subdirectory, required for tracking files stored in
the moons
subdirectory?
No. Dracula does not need to make the moons
subdirectory
a Git repository because the vampires-and-werewolves
repository can track any files, sub-directories, and subdirectory files
under the vampires-and-werewolves
directory. Thus, in order
to track all information about moons, Dracula only needed to add the
moons
subdirectory to the
vampires-and-werewolves
directory.
Additionally, Git repositories can interfere with each other if they
are “nested”: the outer repository will try to version-control the inner
repository. Therefore, it is best to create each new Git repository in a
separate directory. To be sure that there is no conflicting repository
in the directory, check the output of git status
. If it
looks like the following, you are good to go to create a new repository
as shown above:
OUTPUT
fatal: Not a git repository (or any of the parent directories): .git
Correcting git init
Mistakes
Wolfman explains to Dracula how a nested repository is redundant and
may cause confusion down the road. Dracula would like to remove the
nested repository. How can Dracula undo his last git init
in the moons
subdirectory?
Background
Removing files from a Git repository needs to be done with caution. But we have not learned yet how to tell Git to track a particular file; we will learn this in the next episode. Files that are not tracked by Git can easily be removed like any other “ordinary” files with
Similarly a directory can be removed using rm -r dirname
or rm -rf dirname
. If the files or folder being removed in
this fashion are tracked by Git, then their removal becomes another
change that we will need to track, as we will see in the next
episode.
Solution
Git keeps all of its files in the .git
directory. To
recover from this little mistake, Dracula can just remove the
.git
folder in the moons subdirectory by running the
following command from inside the vampires-and-werewolves
directory:
But be careful! Running this command in the wrong directory will
remove the entire Git history of a project you might want to keep.
Therefore, always check your current directory using the command
pwd
.
Key Points
-
git init
initializes a repository. - Git stores all of its repository data in the
.git
directory.
Content from Tracking Changes
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I record changes in Git?
- How do I check the status of my version control repository?
- How do I record notes about what changes I made and why?
Objectives
- Go through the modify-add-commit cycle for one or more files.
- Explain where information is stored at each stage of that cycle.
- Distinguish between descriptive and non-descriptive commit messages.
First let us make sure we are still in the right directory. You
should be in the vampires-and-werewolves
directory.
Let us create a file called mars.txt
that contains some
notes about the Red Planet’s suitability for vampires and werewolves. We
will use nano
to edit the file; you can use whatever editor
you like. In particular, this does not have to be the
core.editor
you set globally earlier. But remember, the
bash command to create or edit a new file will depend on the editor you
choose (it might not be nano
). For a refresher on text
editors, check out “Which Editor?” in The Unix Shell lesson.
Type the text below into the mars.txt
file:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
Let us first verify that the file was properly created by running the
list command (ls
):
OUTPUT
mars.txt
mars.txt
contains a single line, which we can see by
running:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
If we check the status of our project again, Git tells us that it has noticed the new file:
OUTPUT
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
mars.txt
nothing added to commit but untracked files present (use "git add" to track)
The “untracked files” message means that there is a file in the
directory that Git is not keeping track of. We can tell Git to track a
file using git add
:
and then check that the right thing happened:
OUTPUT
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: mars.txt
Git now knows that it is supposed to keep track of
mars.txt
, but it has not recorded these changes as a commit
yet. To get it to do that, we need to run one more command:
OUTPUT
[main (root-commit) f22b25e] Start notes on Mars suitability for vampires and werewolves
1 file changed, 1 insertion(+)
create mode 100644 mars.txt
When we run git commit
, Git takes everything we have
told it to save by using git add
and stores a copy
permanently inside the special .git
directory. This
permanent copy is called a commit
(or revision) and its short
identifier is f22b25e
. Your commit may have another
identifier.
We use the -m
flag (for “message”) to record a short,
descriptive, and specific comment that will help us remember later on
what we did and why. If we just run git commit
without the
-m
option, Git will launch nano
(or whatever
other editor we configured as core.editor
) so that we can
write a longer message.
Good commit messages start with a
brief (<50 characters) statement about the changes made in the
commit. Generally, the message should complete the sentence “If applied,
this commit will”
If we run git status
now:
OUTPUT
On branch main
nothing to commit, working tree clean
it tells us everything is up to date. If we want to know what we have
done recently, we can ask Git to show us the project’s history using
git log
:
OUTPUT
commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 09:51:46 2013 -0400
Start notes on Mars suitability for vampires and werewolves
git log
lists all commits made to a repository in
reverse chronological order. The listing for each commit includes the
commit’s full identifier (which starts with the same characters as the
short identifier printed by the git commit
command
earlier), the commit’s author, when it was created, and the log message
Git was given when the commit was created.
Where Are My Changes?
If we run ls
at this point, we will still see just one
file called mars.txt
. That is because Git saves information
about files’ history in the special .git
directory
mentioned earlier so that our filesystem does not become cluttered (and
so that we cannot accidentally edit or delete an old version).
Now suppose Dracula adds more information to the file. (Again, we
will edit with nano
and then cat
the file to
show its contents; you may use a different editor, and do not need to
cat
.)
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
When we run git status
now, it tells us that a file it
already knows about has been modified:
OUTPUT
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: mars.txt
no changes added to commit (use "git add" and/or "git commit -a")
The last line is the key phrase: “no changes added to commit”. We
have changed this file, but we have not told Git we will want to save
those changes (which we do with git add
) nor have we saved
them (which we do with git commit
). So let us do that now.
It is good practice to always review our changes before saving them. We
do this using git diff
. This shows us the differences
between the current state of the file and the most recently saved
version:
OUTPUT
diff --git a/mars.txt b/mars.txt
index df0654a..315bf3a 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,2 @@
Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
The output is cryptic because it is actually a series of commands for
tools like editors and patch
telling them how to
reconstruct one file given the other. If we break it down into
pieces:
- The first line tells us that Git is producing output similar to the
Unix
diff
command comparing the old and new versions of the file. - The second line tells exactly which versions of the file Git is
comparing;
df0654a
and315bf3a
are unique computer-generated labels for those versions. - The third and fourth lines once again show the name of the file being changed.
- The remaining lines are the most interesting, they show us the
actual differences and the lines on which they occur. In particular, the
+
marker in the first column shows where we added a line.
After reviewing our change, it is time to commit it:
OUTPUT
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: mars.txt
no changes added to commit (use "git add" and/or "git commit -a")
Whoops: Git will not commit because we did not use
git add
first. Let us fix that:
OUTPUT
[main 34961b1] Add information about suitability of Mars for werewolves
1 file changed, 1 insertion(+)
Git insists that we add files to the set we want to commit before actually committing anything. This allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we are adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we have not finished yet).
To allow for this, Git has a special staging area where it keeps track of things that have been added to the current changeset but not yet committed.
Staging Area
If you think of Git as taking snapshots of changes over the life of a
project, git add
specifies what will go in a
snapshot (putting things in the staging area), and
git commit
then actually takes the snapshot, and
makes a permanent record of it (as a commit). If you do not have
anything staged when you type git commit
, Git will prompt
you to use git commit -a
or git commit --all
,
which is kind of like gathering everyone to take a group photo!
However, it is almost always better to explicitly add things to the
staging area, because you might commit changes you forgot you made.
(Going back to the group photo simile, you might get an extra with
incomplete makeup walking on the stage for the picture because you used
-a
!) Try to stage things manually, or you might find
yourself searching for “git undo commit” more than you would like!
Let us watch as our changes to a file move from our editor to the staging area and into long-term storage. First, we will add another line to the file:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
OUTPUT
diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
So far, so good: we have added one line to the end of the file (shown
with a +
in the first column). Now let us put that change
in the staging area and see what git diff
reports:
There is no output: as far as Git can tell, there is no difference between what it has been asked to save permanently and what is currently in the directory. However, if we do this:
OUTPUT
diff --git a/mars.txt b/mars.txt
index 315bf3a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,2 +1,3 @@
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
it shows us the difference between the last committed change and what is in the staging area. Let us save our changes:
OUTPUT
[main 005937f] Discuss suitability of Mars' climate for mummies
1 file changed, 1 insertion(+)
check our status:
OUTPUT
On branch main
nothing to commit, working tree clean
and look at the history of what we have done so far:
OUTPUT
commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main)
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 10:14:07 2013 -0400
Discuss suitability of Mars' climate for mummies
commit 34961b159c27df3b475cfe4415d94a6d1fcd064d
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 10:07:21 2013 -0400
Add information about suitability of Mars for werewolves
commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 09:51:46 2013 -0400
Start notes on Mars suitability for vampires and werewolves
Word-based diffing
Sometimes, e.g. in the case of the text documents a line-wise diff is
too coarse. That is where the --color-words
option of
git diff
comes in very useful as it highlights the changed
words using colors.
Paging the Log
When the output of git log
is too long to fit in your
screen, git
uses a program to split it into pages of the
size of your screen. When this “pager” is called, you will notice that
the last line in your screen is a :
, instead of your usual
prompt.
- To get out of the pager, press Q.
- To move to the next page, press Spacebar.
- To search for
some_word
in all pages, press / and typesome_word
. Navigate through matches pressing n.
Limit Log Size
To avoid having git log
cover your entire terminal
screen, you can limit the number of commits that Git lists by using
-N
, where N
is the number of commits that you
want to view. For example, if you only want information from the last
commit you can use:
OUTPUT
commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main)
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 10:14:07 2013 -0400
Discuss suitability of Mars' climate for mummies
You can also reduce the quantity of information using the
--oneline
option:
OUTPUT
005937f (HEAD -> main) Discuss suitability of Mars' climate for mummies
34961b1 Add information about suitability of Mars for werewolves
f22b25e Start notes on Mars suitability for vampires and werewolves
You can also combine the --oneline
option with others.
One useful combination adds --graph
to display the commit
history as a text-based graph and to indicate which commits are
associated with the current HEAD
, the current branch
main
, or other Git references:
OUTPUT
* 005937f (HEAD -> main) Discuss suitability of Mars' climate for mummies
* 34961b1 Add information about suitability of Mars for werewolves
* f22b25e Start notes on Mars suitability for vampires and werewolves
Directories
Two important facts you should know about directories in Git.
- Git does not track directories on their own, only files within them. Try it for yourself:
Note, our newly created empty directory spaceships
does
not appear in the list of untracked files even if we explicitly add it
(via git add
) to our repository. This is the
reason why you will sometimes see .gitkeep
files in
otherwise empty directories. Unlike .gitignore
, these files
are not special and their sole purpose is to populate a directory so
that Git adds it to the repository. In fact, you can name such files
anything you like.
- If you create a directory in your Git repository and populate it with files, you can add all files in the directory at once by:
Try it for yourself:
BASH
$ touch spaceships/apollo-11 spaceships/sputnik-1
$ git status
$ git add spaceships
$ git status
Before moving on, we will commit these changes.
To recap, when we want to add changes to our repository, we first
need to add the changed files to the staging area (git add
)
and then commit the staged changes to the repository
(git commit
):
Choosing a Commit Message
Which of the following commit messages would be most appropriate for
the last commit made to mars.txt
?
- “Changes”
- “Added line ‘Mummies will appreciate the lack of humidity’ to mars.txt”
- “Discuss suitability of Mars’ climate for mummies”
Answer 1 is not descriptive enough, and the purpose of the commit is unclear; and answer 2 is redundant to using “git diff” to see what changed in this commit; but answer 3 is good: short, descriptive, and imperative.
Committing Changes to Git
Which command(s) below would save the changes of
myfile.txt
to my local Git repository?
- Would only create a commit if files have already been staged.
- Would try to create a new repository.
- Is correct: first add the file to the staging area, then commit.
- Would try to commit a file “my recent changes” with the message myfile.txt.
Committing Multiple Files
The staging area can hold changes from any number of files that you want to commit as a single snapshot.
- Add some text to
mars.txt
noting your decision to consider adding mummies to your model - Create a new file
mummies.txt
with your initial thoughts about including co-occurrences of mummies in your model - Add changes from both files to the staging area, and commit those changes.
The output below from cat mars.txt
reflects only content
added during this exercise. Your output may vary.
First we make our changes to the mars.txt
and
mummies.txt
files:
OUTPUT
Maybe we should also consider including mummies in our model.
OUTPUT
Mummies often co-occur with vampires and werewolves in stories. We should definitely include mummies in our co-occurrence model.
Now you can add both files to the staging area. We can do that in one line:
Or with multiple commands:
Now the files are ready to commit. You can check that using
git status
. If you are ready to commit use:
OUTPUT
[main cc127c2]
Write plans to add mummies to model
2 files changed, 2 insertions(+)
create mode 100644 mummies.txt
bio
Repository
- Create a new Git repository on your computer called
bio
. - Write a three-line biography for yourself in a file called
me.txt
, commit your changes - Modify one line, add a fourth line
- Display the differences between its updated state and its original state.
If needed, move out of the vampires-and-werewolves
folder:
Create a new folder called bio
and ‘move’ into it:
Initialize git:
Create your biography file me.txt
using
nano
or another text editor. Once in place, add and commit
it to the repository:
Modify the file as described (modify one line, add a fourth line). To
display the differences between its updated state and its original
state, use git diff
:
Key Points
-
git status
shows the status of a repository. - Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
-
git add
puts files in the staging area. -
git commit
saves the staged content as a new commit in the local repository. - Write a commit message that accurately describes your changes.
Content from Exploring History
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How can I identify old versions of files?
- How do I review my changes?
- How can I recover old versions of files?
Objectives
- Explain what the HEAD of a repository is and how to use it.
- Identify and use Git commit numbers.
- Compare various versions of tracked files.
- Restore old versions of files.
As we saw in the previous episode, we can refer to commits by their
identifiers. You can refer to the most recent commit of the
working directory by using the identifier HEAD
.
We have been adding one line at a time to mars.txt
, so
it is easy to track our progress by looking, so let us do that using our
HEAD
s. Before we start, let us make a change to
mars.txt
, adding yet another line.
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Now, let us see what we get.
OUTPUT
diff --git a/mars.txt b/mars.txt
index b36abfd..0848c8d 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1,3 +1,4 @@
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
+Why are we talking about mummies?
which is the same as what you would get if you leave out
HEAD
(try it). The real goodness in all this is when you
can refer to previous commits. We do that by adding ~1
(where “~” is “tilde”, pronounced [til-duh])
to refer to the commit one before HEAD
.
If we want to see the differences between older commits we can use
git diff
again, but with the notation HEAD~1
,
HEAD~2
, and so on, to refer to them:
OUTPUT
diff --git a/mars.txt b/mars.txt
index df0654a..b36abfd 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?
We could also use git show
which shows us what changes
we made at an older commit as well as the commit message, rather than
the differences between a commit and our working directory that
we see by using git diff
.
OUTPUT
commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Vlad Dracula <vdracula@usgs.gov>
Date: Thu Aug 22 09:51:46 2013 -0400
Start notes on Mars suitability for vampires and werewolves
diff --git a/mars.txt b/mars.txt
new file mode 100644
index 0000000..df0654a
--- /dev/null
+++ b/mars.txt
@@ -0,0 +1 @@
+Cold, dry, and everything is red, vampires' favorite color
In this way, we can build up a chain of commits. The most recent end
of the chain is referred to as HEAD
; we can refer to
previous commits using the ~
notation, so
HEAD~1
means “the previous commit”, while
HEAD~123
goes back 123 commits from where we are now.
We can also refer to commits using those long strings of digits and
letters that git log
displays. These are unique IDs for the
changes, and “unique” really does mean unique: every change to any set
of files on any computer has a unique 40-character identifier. Our first
commit was given the ID
f22b25e3233b4645dabd0d81e651fe074bd8e73b
, so let us try
this:
OUTPUT
diff --git a/mars.txt b/mars.txt
index df0654a..93a3e13 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?
That is the right answer, but typing out random 40-character strings is annoying, so Git lets us use just the first few characters (typically seven for normal size projects):
OUTPUT
diff --git a/mars.txt b/mars.txt
index df0654a..93a3e13 100644
--- a/mars.txt
+++ b/mars.txt
@@ -1 +1,4 @@
Cold, dry, and everything is red, vampires' favorite color
+The two moons may be a problem for werewolves
+Mummies will appreciate the lack of humidity
+Why are we talking about mummies?
All right! So we can save changes to files and see what we have
changed. Now, how can we restore older versions of things? Let us
suppose we change our mind about the last update to
mars.txt
(questioning the topic of mummies).
git status
now tells us that the file has been changed,
but those changes have not been staged:
OUTPUT
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: mars.txt
no changes added to commit (use "git add" and/or "git commit -a")
We can put things back the way they were by using
git restore
:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
As you might guess from its name, git checkout
checks
out (i.e., restores) an old version of a file. In this case, we are
telling Git that we want to recover the version of the file recorded in
HEAD
, which is the last saved commit. If we want to go back
even further, we can use a commit identifier instead:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
OUTPUT
On branch main
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: mars.txt
Notice that the changes are currently in the staging area. Again, we
can put things back the way they were by using
git checkout
:
Do Not Lose Your HEAD
Above we used
to revert mars.txt
to its state after the commit
f22b25e
. But be careful! The command checkout
has other important functionalities and Git will misunderstand your
intentions if you are not accurate with the typing. For example, if you
forget mars.txt
in the previous command.
ERROR
Note: checking out 'f22b25e'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at f22b25e Start notes on Mars suitability for vampires and werewolves
The “detached HEAD” is like “look, but do not touch” here, so you
should not make any changes in this state. After investigating your
repository’s past state, reattach your HEAD
with
git checkout main
.
It is important to remember that we must use the commit number that
identifies the state of the repository before the change we are
trying to undo. A common mistake is to use the number of the commit in
which we made the change we are trying to discard. In the example below,
we want to retrieve the state from before the most recent commit
(HEAD~1
), which is commit f22b25e
:
So, to put it all together, here is how Git works in cartoon form:
Simplifying the Common Case
If you read the output of git status
carefully, you will
see that it includes this hint:
OUTPUT
(use "git checkout -- <file>..." to discard changes in working directory)
As it says, git checkout
without a version identifier
restores files to the state saved in HEAD
. The double dash
--
is needed to separate the names of the files being
recovered from the command itself: without it, Git would try to use the
name of the file as the commit identifier.
The fact that files can be reverted one by one tends to change the way people organize their work. If everything is in one large document, it is hard (but not impossible) to undo changes to the introduction without also undoing changes made later to the conclusion. If the introduction and conclusion are stored in separate files, on the other hand, moving backward and forward in time becomes much easier.
Recovering Older Versions of a File
Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning “broke” the script and it no longer runs. She has spent ~ 1hr trying to fix it, with no luck…
Luckily, she has been keeping track of her project’s versions using
Git! Which commands below will let her recover the last committed
version of her Python script called data_cruncher.py
?
$ git checkout HEAD
$ git checkout HEAD data_cruncher.py
$ git checkout HEAD~1 data_cruncher.py
$ git checkout <unique ID of last commit> data_cruncher.py
Both 2 and 4
The answer is (5)-Both 2 and 4.
The checkout
command restores files from the repository,
overwriting the files in your working directory. Answers 2 and 4 both
restore the latest version in the repository of the
file data_cruncher.py
. Answer 2 uses HEAD
to
indicate the latest, whereas answer 4 uses the unique ID of the
last commit, which is what HEAD
means.
Answer 3 gets the version of data_cruncher.py
from the
commit before HEAD
, which is NOT what we
wanted.
Answer 1 can be dangerous! Without a filename,
git checkout
will restore all files in the
current directory (and all directories below it) to their state at the
commit specified. This command will restore
data_cruncher.py
to the latest commit version, but it will
also restore any other files that are changed to that version,
erasing any changes you may have made to those files! As discussed
above, you are left in a detached HEAD
state, and
you do not want to be there.
Reverting a Commit
Jennifer is collaborating with colleagues on her Python script. She
realizes her last commit to the project’s repository contained an error,
and wants to undo it. Jennifer wants to undo correctly so everyone in
the project’s repository gets the correct change. The command
git revert [erroneous commit ID]
will create a new commit
that reverses the erroneous commit.
The command git revert
is different from
git checkout [commit ID]
because git checkout
returns the files not yet committed within the local repository to a
previous state, whereas git revert
reverses changes
committed to the local and project repositories.
Below are the right steps and explanations for Jennifer to use
git revert
, what is the missing command?
________ # Look at the git history of the project to find the commit ID
Copy the ID (the first few characters of the ID, e.g. 0b1d055).
git revert [commit ID]
Type in the new commit message.
Save and close
The command git log
lists project history with commit
IDs.
The command git show HEAD
shows changes made at the
latest commit, and lists the commit ID; however, Jennifer should
double-check it is the correct commit, and no one else has committed
changes to the repository.
Understanding Workflow and History
What is the output of the last command in
BASH
$ cd vampires-and-werewolves
$ echo "Mummies are beautiful and full of love" > mummies.txt
$ git add mummies.txt
$ echo "Mummies are smelly and gross" >> mummies.txt
$ git commit -m "Comment on Mummy hygiene"
$ git checkout HEAD mummies.txt
$ cat mummies.txt #this will print the contents of mummies.txt to the screen
OUTPUT
Mummies are smelly and gross
OUTPUT
Mummies are beautiful and full of love
OUTPUT
Mummies are beautiful and full of love Mummies are smelly and gross
OUTPUT
Error because you have changed mummies.txt without committing the changes
The answer is 2.
The command git add mummies.txt
places the current
version of mummies.txt
into the staging area. The changes
to the file from the second echo
command are only applied
to the working copy, not the version in the staging area.
So, when git commit -m "Comment on Mummy hygiene"
is
executed, the version of mummies.txt
committed to the
repository is the one from the staging area and has only one line.
At this time, the working copy still has the second line (and
git status
will show that the file is modified). However,
git checkout HEAD mummies.txt
replaces the working copy
with the most recently committed version of
mummies.txt
.
So, cat mummies.txt
will output
OUTPUT
Mummies are beautiful and full of love.
Checking Understanding of
git diff
Consider this command: git diff HEAD~9 mars.txt
. What do
you predict this command will do if you execute it? What happens when
you do execute it? Why?
Try another command, git diff [ID] mars.txt
, where [ID]
is replaced with the unique identifier for your most recent commit. What
do you think will happen, and what does happen?
Getting Rid of Staged Changes
git checkout
can be used to restore a previous commit
when unstaged changes have been made, but will it also work for changes
that have been staged but not committed? Make a change to
mars.txt
, add that change using git add
, then
use git checkout
to see if you can remove your change.
After adding a change, git checkout
can not be used
directly. Let us look at the output of git status
:
OUTPUT
On branch main
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: mars.txt
Note that if you do not have the same output you may either have forgotten to change the file, or you have added it and committed it.
Using the command git checkout -- mars.txt
now does not
give an error, but it does not restore the file either. Git helpfully
tells us that we need to use git reset
first to unstage the
file:
OUTPUT
Unstaged changes after reset:
M mars.txt
Now, git status
gives us:
OUTPUT
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: mars.txt
no changes added to commit (use "git add" and/or "git commit -a")
This means we can now use git checkout
to restore the
file to the previous commit:
OUTPUT
On branch main
nothing to commit, working tree clean
Explore and Summarize Histories
Exploring history is an important part of Git, and often it is a challenge to find the right commit ID, especially if the commit is from several months ago.
Imagine the vampires-and-werewolves
project has more
than 50 files. You would like to find a commit that modifies some
specific text in mars.txt
. When you type
git log
, a very long list appeared. How can you narrow down
the search?
Recall that the git diff
command allows us to explore
one specific file, e.g., git diff mars.txt
. We can apply a
similar idea here.
Unfortunately some of these commit messages are very ambiguous, e.g.,
update files
. How can you search through these files?
Both git diff
and git log
are very useful
and they summarize a different part of the history for you. Is it
possible to combine both? Let us try the following:
You should get a long list of output, and you should be able to see both commit messages and the difference between each commit.
Question: What does the following command do?
Key Points
-
git diff
displays differences between commits. -
git checkout
recovers old versions of files.
Content from Ignoring Things
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How can I tell Git to ignore files I don’t want to track?
Objectives
- Configure Git to ignore specific files.
- Explain why ignoring files can be useful.
What if we have files that we do not want Git to track for us, like backup files created by our editor or intermediate files created during data analysis? Let’s create a few dummy files:
and see what Git says:
OUTPUT
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
a.csv
b.csv
c.csv
results/
nothing added to commit but untracked files present (use "git add" to track)
Putting these files under version control would be a waste of disk space. What’s worse, having them all listed could distract us from changes that actually matter, so let’s tell Git to ignore them.
We do this by creating a file in the root directory of our project
called .gitignore
:
OUTPUT
*.csv
results/
These patterns tell Git to ignore any file whose name ends in
.csv
and everything in the results
directory.
(If any of these files were already being tracked, Git would continue to
track them.)
Once we have created this file, the output of git status
is much cleaner:
OUTPUT
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
nothing added to commit but untracked files present (use "git add" to track)
The only thing Git notices now is the newly-created
.gitignore
file. You might think we wouldn’t want to track
it, but everyone we’re sharing our repository with will probably want to
ignore the same things that we’re ignoring. Let’s add and commit
.gitignore
:
OUTPUT
On branch main
nothing to commit, working tree clean
As a bonus, using .gitignore
helps us avoid accidentally
adding files to the repository that we don’t want to track:
OUTPUT
The following paths are ignored by one of your .gitignore files:
a.csv
Use -f if you really want to add them.
If we really want to override our ignore settings, we can use
git add -f
to force Git to add something. For example,
git add -f a.csv
. We can also always see the status of
ignored files if we want:
OUTPUT
On branch main
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
a.csv
b.csv
c.csv
results/
nothing to commit, working tree clean
If you only want to ignore the contents of
results/plots
, you can change your .gitignore
to ignore only the /plots/
subfolder by adding the
following line to your .gitignore:
OUTPUT
results/plots/
This line will ensure only the contents of results/plots
is ignored, and not the contents of results/data
.
As with most programming issues, there are a few alternative ways that one may ensure this ignore rule is followed. The “Ignoring Nested Files: Variation” exercise has a slightly different directory structure that presents an alternative solution. Further, the discussion page has more detail on ignore rules.
Including Specific Files
How would you ignore all .csv
files in your root
directory except for final.csv
? Hint: Find out what
!
(the exclamation point operator) does
You would add the following two lines to your .gitignore:
OUTPUT
*.csv # ignore all data files
!final.csv # except final.csv
The exclamation point operator will include a previously excluded entry.
Note also that because you’ve previously committed .csv
files in this lesson they will not be ignored with this new rule. Only
future additions of .csv
files added to the root directory
will be ignored.
Ignoring Nested Files: Variation
Given a directory structure that looks similar to the earlier Nested Files exercise, but with a slightly different directory structure:
How would you ignore all of the contents in the results folder, but
not results/data
?
Hint: think a bit about how you created an exception with the
!
operator before.
If you want to ignore the contents of results/
but not
those of results/data/
, you can change your
.gitignore
to ignore the contents of results folder, but
create an exception for the contents of the results/data
subfolder. Your .gitignore would look like this:
OUTPUT
results/* # ignore everything in results folder
!results/data/ # do not ignore results/data/ contents
Ignoring all data Files in a Directory
Assuming you have an empty .gitignore file, and given a directory structure that looks like:
BASH
results/data/position/gps/a.csv
results/data/position/gps/b.csv
results/data/position/gps/c.csv
results/data/position/gps/info.txt
results/plots
What’s the shortest .gitignore
rule you could write to
ignore all .csv
files in
result/data/position/gps
? Do not ignore the
info.txt
.
Appending results/data/position/gps/*.csv
will match
every file in results/data/position/gps
that ends with
.csv
. The file
results/data/position/gps/info.txt
will not be ignored.
Ignoring all data Files in the repository
Let us assume you have many .csv
files in different
subdirectories of your repository. For example, you might have:
BASH
results/a.csv
data/experiment_1/b.csv
data/experiment_2/c.csv
data/experiment_2/variation_1/d.csv
How do you ignore all the .csv
files, without explicitly
listing the names of the corresponding folders?
In the .gitignore
file, write:
OUTPUT
**/*.csv
This will ignore all the .csv
files, regardless of their
position in the directory tree. You can still include some specific
exception with the exclamation point operator.
The !
modifier will negate an entry from a previously
defined ignore pattern. Because the !*.csv
entry negates
all of the previous .csv
files in the
.gitignore
, none of them will be ignored, and all
.csv
files will be tracked.
Log Files
You wrote a script that creates many intermediate log-files of the
form log_01
, log_02
, log_03
, etc.
You want to keep them but you do not want to track them through
git
.
Write one
.gitignore
entry that excludes files of the formlog_01
,log_02
, etc.Test your “ignore pattern” by creating some dummy files of the form
log_01
, etc.You find that the file
log_01
is very important after all, add it to the tracked files without changing the.gitignore
again.Discuss with your neighbor what other types of files could reside in your directory that you do not want to track and thus would exclude via
.gitignore
.
- append either
log_*
orlog*
as a new entry in your .gitignore - track
log_01
usinggit add -f log_01
Key Points
- The
.gitignore
file tells Git what files to ignore.
Content from Remotes in GitLab
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I safely back up my work to a remote site?
- How do I share my changes with others on the web?
Objectives
- Explain what remote repositories are and why they are useful.
- Push to or pull from a remote repository.
Version control really comes into its own when we begin to collaborate with other people. We already have most of the machinery we need to do this; the only thing missing is to copy changes from one repository to another.
Systems like Git allow us to move work between any two repositories. In practice, though, it is easiest to use one copy as a central hub, and to keep it on the web rather than on someone’s laptop. Most programmers use hosting services like GitHub, Bitbucket or GitLab to hold those main copies.
Let us start by sharing the changes we have made to our current project with the world. To this end we are going to create a remote repository that will be linked to our local repository.
1. Create a remote repository
Log in to USGS GitLab, then
click on the icon in the top right corner to create a new project called
vampires-and-werewolves
:

Select Create blank project

Name your project “vampires-and-werewolves”, select your username as
the namespace
, uncheck “Initialize repository with a
README”, and then click Create project
.
Note: Since this repository will be connected to a local repository, it needs to be empty. That is why “Initialize repository with a README” needs to be unchecked. See the “GitLab README files” exercise below for a full explanation of why the project needs to be empty.

As soon as the repository is created, GitLab displays a page with a URL and some information on how to configure your local repository:

This effectively does the following on Gitlab’s servers:
If you remember back to the earlier episode where we added and committed our earlier
work on mars.txt
, we had a diagram of the local repository
which looked like this:
Now that we have two repositories, we need a diagram like this:

Note that our local repository still contains our earlier work on
mars.txt
, but the remote repository on GitLab appears empty
as it does not contain any files yet.
2. HTTPS Setup
Before Dracula can connect to a remote repository, he needs to set up a way for his computer to authenticate with GitLab so it knows it is him trying to connect to his remote repository.
We are going to set up an “Access token” that we can use to authenticate to GitLab.
In GitLab, click on your user icon and then
Preferences
.

Once you are in your User Settings, click on
Access tokens
. If you already have a personal access token
and you have it saved, you do not need to follow these steps. If you do
not have a personal access token, click on
Add new token
.

Add a token name that will be meaningful to you. If you are setting
this up as your primary access token for GitLab, you will probably want
to select all of the scopes. These scopes establish what you are able to
do in GitLab with this personal access token. You may also want to
delete the expiration date. If removed, GitLab will automatically set
the expiration date to the maximum of one year from the day created.
Then, click Create personal access token
.

You will be presented with your new personal access token. Make sure you save it some place secure since you will not be able to access it again through GitLab.

See the GitLab Documentation for more information on personal access tokens.
When you start interacting with the remote from your computer, if you have not already saved your personal access token, you will be prompted to enter a username and password. The prompt may appear in your command prompt or you may get a pop up on your machine. Your username will be your email address and the password will be your personal access token. See the Password Manager spoiler below for more information on saving your personal access token and handling token expiration.
3. Connect local to remote repository
Now we connect the two repositories. We do this by making the GitLab repository a remote for the local repository. The home page of the repository on GitLab includes the URL string we need to identify it:

Click on the clipboard icon under ‘Clone with HTTPS’ to use the HTTPS protocol.
HTTPS allows you to communicate with GitLab using the HTTPS protocol. This approach tends to be a little simpler and allows you to use a Personal Access Token (similar to a password) to authenticate. You can use the same Personal Access Token across multiple machines.
SSH is considered slightly more secure and requires setting up a public and a private key. There is a little more overhead to using SSH over HTTPS, especially if working on more than one machine, which is why we teach the HTTPS method in this Lesson. That being said, it is not too hard to configure your account to use SSH and the instructions are available at https://docs.gitlab.com/ee/user/ssh.html.
With the URL copied from the browser, go into the local
vampires-and-werewolves
repository, and run this
command:
Make sure to use the URL for your repository rather than Vlad’s: the
only difference should be your username instead of
vdracula
.
origin
is a local name used to refer to the remote
repository. It could be called anything, but origin
is a
convention that is often used by default in git and GitLab, so it is
helpful to stick with this unless there is a reason not to.
We can check that the command has worked by running
git remote -v
:
OUTPUT
origin git@code.usgs.gov:vdracula/vampires-and-werewolves.git (fetch)
origin git@code.usgs.gov:vdracula/vampires-and-werewolves.git (push)
We will discuss remotes in more detail in a future episode, while talking about how they might be used for collaboration.
4. Push local changes to a remote
Now that authentication is setup, we can return to the remote. This command will push the changes from our local repository to the repository on GitLab:
Since Dracula set up a personal access token, it will prompt him for it. If you have already saved your personal access token in Git, it may not prompt for a password.
OUTPUT
Enumerating objects: 16, done.
Counting objects: 100% (16/16), done.
Delta compression using up to 8 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (16/16), 1.45 KiB | 372.00 KiB/s, done.
Total 16 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), done.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
* [new branch] main -> main
Proxy
If the network you are connected to uses a proxy, there is a chance that your last command failed with “Could not resolve hostname” as the error message. To solve this issue, you need to tell Git about the proxy:
BASH
$ git config --global http.proxy http://user:password@proxy.url
$ git config --global https.proxy https://user:password@proxy.url
When you connect to another network that does not use a proxy, you will need to tell Git to disable the proxy using:
If your operating system has a password manager configured,
git push
will try to use it when it needs your username and
password. For example, this is the default behavior for Git Bash on
Windows. If you want to type your username and password at the terminal
instead of using a password manager, type:
in the terminal, before you run git push
. Despite the
name, Git
uses SSH_ASKPASS
for all credential entry, so you may
want to unset SSH_ASKPASS
whether you are using Git via SSH
or https.
You may also want to add unset SSH_ASKPASS
at the end of
your ~/.bashrc
to make Git default to using the terminal
for usernames and passwords.
If your personal access token was saved in your password manager and
it expires, you will need to generate a new personal access token and
open the password manager to delete the saved credential. Then, Git will
prompt you for the new password on your next git push
.
Our local and remote repositories are now in this state:

The ‘-u’ Flag
You may see a -u
option used with git push
in some documentation. This option is synonymous with the
--set-upstream-to
option for the git branch
command, and is used to associate the current branch with a remote
branch so that the git pull
command can be used without any
arguments. To do this, simply use git push -u origin main
once the remote has been set up.
We can pull changes from the remote repository to the local one as well:
OUTPUT
From https://code.usgs.gov/vdracula/vampires-and-werewolves
* branch main -> FETCH_HEAD
Already up-to-date.
Pulling has no effect in this case because the two repositories are already synchronized. If someone else had pushed some changes to the repository on GitLab, though, this command would download them to our local repository.
GitLab GUI
Browse to your vampires-and-werewolves
repository on
Gitlab. On the right side menu under Project Information, find and click
on the text that says “XX commits” (where “XX” is some number). Hover
over, and click on, the two buttons to the right of each commit.
Additionally, click into each commit. What information can you
gather/explore from these buttons and views? How would you get that same
information in the shell?
The left-most button (with the picture of a clipboard) copies the
full identifier of the commit to the clipboard. In the shell,
git log
will show you the full commit identifier for each
commit.
The right-most button lets you view all of the files in the
repository at the time of that commit. To do this in the shell, we would
need to checkout the repository at that particular time. We can do this
with git checkout ID
where ID is the identifier of the
commit we want to look at. If we do this, we need to remember to put the
repository back to the right state afterwards!
When you click on the commit name, you will see all of the changes
that were made in that particular commit. Green shaded lines indicate
additions and red ones removals. In the shell we can do the same thing
with git diff
. In particular,
git diff ID1..ID2
where ID1 and ID2 are commit identifiers
(e.g. git diff a3bf1e5..041e637
) will show the differences
between those two commits.
Uploading files directly in GitLab browser
GitLab also allows you to skip the command line and upload files
directly to your repository without having to leave the browser. When
you are in Code
–> Repository
, you can
click the +
button in the toolbar at the top of the file
tree, then click Upload File
under
This directory
.
GitLab Timestamp
Go to the repo you just created on GitLab and check the timestamps of the files. How does GitLab record times, and why?
GitLab displays timestamps in a human readable relative format (i.e. “22 hours ago” or “three weeks ago”). However, if you hover over the timestamp, you can see the exact time at which the last change to the file occurred.
Push vs. Commit
In this episode, we introduced the “git push” command. How is “git push” different from “git commit”?
When we push changes, we are interacting with a remote repository to update it with the changes we have made locally (often this corresponds to sharing the changes we have made with others). Commit only updates your local repository.
GitLab README files
In this episode we learned about creating a remote repository on GitLab, but when you initialized your GitLab repo, you did not add a README.md file. If you had, what do you think would have happened when you tried to link your local and remote repositories?
In this case, we would see a merge conflict due to unrelated histories. When GitLab creates a README.md file, it performs a commit in the remote repository. When you try to pull the remote repository to your local repository, Git detects that they have histories that do not share a common origin and refuses to merge.
OUTPUT
warning: no common commits
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
* branch main -> FETCH_HEAD
* [new branch] main -> origin/main
fatal: refusing to merge unrelated histories
You can force git to merge the two repositories with the option
--allow-unrelated-histories
. Be careful when you use this
option and carefully examine the contents of local and remote
repositories before merging.
OUTPUT
From https://code.usgs.gov/vdracula/vampires-and-werewolves
* branch main -> FETCH_HEAD
Merge made by the 'recursive' strategy.
README.md | 1 +
1 file changed, 1 insertion(+)
create mode 100644 README.md
Key Points
- A local Git repository can be connected to one or more remote repositories.
- Use the HTTPS protocol to connect to remote repositories.
-
git push
copies changes from a local repository to a remote repository. -
git pull
copies changes from a remote repository to a local repository.
Content from Branching and Merging
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What are branches in Git and why should I use them?
- How do I merge a branch back into my
main
branch?
Objectives
- Explain why you would want to use a branching workflow, even when you are the only person working on your project.
- Create a branch within a Git repository.
- Create a merge request and merge a branch into a
main
branch.
Git Branches
A Git branch is a version of the repository where you can make and
review changes before updating the clean, trusted content of the
repository. A branch is a safe place to test things out without
impacting your main
branch. You are free to make mistakes
and have the flexibility to fix them within a branch.
Create a branch
- Open Git Bash (Windows) or Terminal (MacOS) and navigate to your
local repository. Once you are in your project, the current branch will
be specified (usually
main
). - Let us create a new branch:
- Execute
git switch -c my-test-branch
- The
-c
flag is what tells Git to create a new branch - This will create a new branch with the name you specified that is
otherwise an identical copy of the branch you just created it from (in
this case,
main
) and switch you over to the new branch
- Execute
- To switch between branches, execute:
git switch <branch-name>
- We can switch back to the
main
branch withgit switch main
- Switching branches will automatically load all of the files on that branch into your computer’s project file directory
- We can switch back to the
Callout
git switch
and git restore
were introduced
in 2019 to separate out the functionality of git checkout
,
which confused many people by doing too many things.
git switch <branch-name>
can be used
interchangeably with git checkout <branch-name>
, but
the command-line options can be slightly different. If you are switching
to an existing branch, then the two would look the same:
However, if you want to create a new branch, they differ:
Create a branch on your own
- Create a new branch on your own called
1-my-first-issue
- Switch back to
main
- Switch back to
1-my-first-issue
git switch -c 1-my-first-issue
git switch main
git switch 1-my-first-issue
GitLab issues are common ways of tracking the work that needs to be
done on a project. A common branch naming convention is to use the issue
number and a short description of what you are doing as the branch name
(e.g, <issue number>-<what-you-are-doing>
),
similar to what you did in this exercise. Another common naming
convention is to use lower-spear-case
for your branch
names.
Make updates to code
This is when you do your work. Create your scripts, organize files/folders, etc. Do all your work in the repository with the correct branch checked out.
Important Note
Repositories should not contain any sensitive information, including personally identifiable information, usernames, passwords, or full file paths. While file paths may not be as obviously sensitive as other examples, they are frequently included in scripts. It is worth mentioning that full file paths also decrease portability of scripts to other users!
Let us edit the mars.txt file, again.
Type the text below into the mars.txt
file after the
last line:
OUTPUT
Two vampires and three werewolves were spotted on Mars.
Add the file to the staging area:
Commit the changes:
Push the changes to remote:
The -u
flag is shorthand for
--set-upstream-to
, which sets the default remote branch for
the current local branch. Prior to this push, the remote repository was
not aware of the local branch, and the local branch did not have any
connection to the remote. Moving forward, this sets the remote-local
association for any future git push
or
git pull
attempts.
Git Merge Requests
Merge requests allow for peer code review before merging new code
into a branch (usually the main
branch).
Creating Merge Requests
There are many ways to create a merge request in GitLab. See GitLab’s Creating merge requests to see them all.
When you push a new branch into GitLab, GitLab will add a banner
message about the push and provide a convenient
Create merge request
button.

If you use this method to create the merge request, you will not need to specify the source and target branches.
Add a succinct title and description. The description can follow this basic format:
- Describe why this merge request exists
- Explain what was changed
- Explain how the change addresses the issue
- Provide information on how the reviewer can test your code
Select an Assignee (This is the person who owns the merge request but is not responsible for reviewing it) and a Reviewer.
Click Create merge request
.
Merging Merge Requests
Once a merge request has been created, you can see an overview, the commits that were made, and all of the line-by-line changes that were made to the content.

After all of the changes have been reviewed, the
Reviewer
can click Approve
and the
Assignee
can click the Merge
button to merge
the updates into the main
branch.
Challenge
- Review your merge request. Can you see the changes that were made? How might you add a comment to a specific line of code?
-
Merge
your changes intomain
. Are you able to see the updated file in yourmain
branch?
Key Points
- A branching workflow enables you to keep your main repository clean
and allows for mistakes, fixes, and reviews before content is merged
into
main
.
Content from Collaborating
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How can I use version control to collaborate with other people?
Objectives
- Clone a remote repository.
- Collaborate by pushing to a common repository.
- Describe the basic collaborative workflow.
For the next step, get into pairs. One person will be the “Owner” and the other will be the “Collaborator”. The goal is for the Collaborator to add changes into the Owner’s repository. We will switch roles at the end, so both persons will play Owner and Collaborator.
Practicing By Yourself
If you are working through this lesson on your own, you can carry on by opening a second terminal window. This window will represent your partner, working on another computer. You will not need to give anyone access on GitLab, because both ‘partners’ are you.
Update Repository Permissions
The Owner needs to give the Collaborator access. In your repository
page on GitLab, click the Manage
menu on the left, select
Members
, click Invite members
. Enter your
partner’s username or email address in the search box, select a role
(either Developer
or Maintainer
), and click
Invite
.

Clone the Repository
Once the Collaborator has access to the repository, they need to download a copy of the Owner’s repository to their machine. This is called “cloning a repo”.
The Collaborator does not want to overwrite their own version of
vampires-and-werewolves.git
, and so needs to clone the
Owner’s repository to a different location than their own repository
with the same name. (This is a weird case…you would not normally have
two versions of the same Git repo on your local machine.)
To clone the Owner’s repo into their Desktop
folder, the
Collaborator can copy the repository URL from the repository homepage by
clicking Code
and Clone with HTTPS
.

HTTPS allows you to communicate with GitLab using the HTTPS protocol. This approach tends to be a little simpler and allows you to use a Personal Access Token (similar to a password) to authenticate. You can use the same Personal Access Token across multiple machines.
SSH is considered slightly more secure and requires setting up a public and a private key. There is a little more overhead to using SSH over HTTPS, especially if working on more than one machine, which is why we teach the HTTPS method in this Lesson. SSH also requires being on the internal USGS network (including GlobalProtect) and will not work for external collaborators. That being said, it is not too hard to configure your account to use SSH and the instructions are available at https://docs.gitlab.com/ee/user/ssh.html.
Then, open bash and enter the following (replacing
https://code.usgs.gov/vdracula/vampires-and-werewolves.git
with the URL that was just copied):
BASH
$ git clone https://code.usgs.gov/vdracula/vampires-and-werewolves.git ~/Desktop/vdracula-vampires-and-werewolves
Replace ‘vdracula’ with the Owner’s username.
If you choose to clone without the clone path
(~/Desktop/vdracula-vampires-and-werewolves
) specified at
the end, you will clone inside your own vampires-and-werewolves folder!
Make sure to navigate to the Desktop
folder first.
Create a New Branch and Make Changes
The Collaborator can now make a change in their clone of the Owner’s repository, exactly the same way as we have been doing before:
BASH
$ cd ~/Desktop/vdracula-vampires-and-werewolves
$ git switch -c pluto-branch
$ nano pluto.txt
$ cat pluto.txt
OUTPUT
It is so a planet!
The Importance of Branches
Using branches in Git becomes even more important when you begin
collaborating with others. Branches can help you avoid conflicts and
allow others to review your code before merging it with the main branch
where it could potentially introduce bugs and conflicts with the work of
others on your team. You can also ‘protect’ the default (e.g.,
main
) branch to prevent developers from pushing changes
directly to it. If the default branch is protected, the developers
must push to a separate branch and then create a merge request
to add their changes to the default branch. This workflow ensures that
changes to the default branch get reviewed and approved. Learn more
about GitLab protected branches in the GitLab Documentation.
Stage, Commit, and Push Changes
OUTPUT
1 file changed, 1 insertion(+)
create mode 100644 pluto.txt
Then push the change to the Owner’s repository on GitLab:
OUTPUT
Enumerating objects: 4, done.
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 306 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
9272da5..29aba7c main -> main
Note that we did not have to create a remote called
origin
: Git uses this name by default when we clone a
repository. (This is why origin
was a sensible choice
earlier when we were setting up remotes by hand.)
Take a look at the Owner’s repository on GitLab again, and you should be able to see the new branch and commit made by the Collaborator. You may need to refresh your browser to see the new commit.
Create and Comment on a Merge Request
Collaborator: Create a merge request that will merge
pluto-branch
with main
. Set the Owner as the
Reviewer.
Owner: Add a comment to the line that was added in
pluto.txt
. Then, approve and merge the merge request.
Collaborator: Review Branching and Merging Episode “Creating Merge Requests” for a reminder of how to create a merge request in GitLab.
Owner: With GitLab, it is possible to comment on the diff of a merge
request. Go to the Changes
tab within the merge request.
Hover over the line of code to comment and a blue comment icon appears.
Click to open a comment window.
Pull Merged Changes to Local Repositories
Once the new code has been merged to the main
branch,
both the Collaborator and Owner should pull the changes to their local
repositories.
To download the changes from GitLab, enter:
Now the three repositories (Owner’s local, Collaborator’s local, and Owner’s on GitLab) are back in sync.
A Basic Collaborative Workflow
In practice, it is good to be sure that you have an updated version
of the repository you are collaborating on, so you should
git pull
before making our changes. The basic collaborative
workflow would be:
- update your local repo with
git pull origin main
, - create a feature branch
git switch -c <branch-name>
, - make your changes and stage them with
git add
, - commit your changes with
git commit -m
, - upload the changes to GitLab with
git push -u origin <branch-name>
, - create a merge request in GitLab, and
- merge once the feature branch has been reviewed and approved.
It is better to make many commits with smaller changes rather than one commit with massive changes: small commits are easier to read and review.
Switch Roles and Repeat
Switch roles and repeat the whole process.
Review Changes
The Owner pushed commits to the repository’s main branch without giving any information to the Collaborator. How can the Collaborator find out what has changed with command line? And on GitLab?
On the command line, the Collaborator can use
git fetch origin main
to get the remote changes into the
local repository, but without merging them. Then by running
git diff main origin/main
the Collaborator will see the
changes output in the terminal.
On GitLab, the Collaborator can go to the repository and click on “Code” -> “Commits” to view the most recent commits pushed to the repository.
Key Points
-
git clone
copies a remote repository to create a local repository with a remote calledorigin
automatically set up. - Branches are an important part of collaborating with others in Git repositories.
- Ensure that you establish a collaborative workflow for your project team to use.
Content from Conflicts
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What do I do when my changes conflict with someone else’s?
Objectives
- Explain what conflicts are and when they can occur.
- Resolve conflicts resulting from a merge.
As soon as people can work in parallel, they may end up introducing changes that conflict with one another. This will even happen with a single person: if we are working on a piece of software on both our laptop and a server in the lab, we could make different changes to each copy. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.
To see how we can resolve conflicts, we must first create one. The
file mars.txt
currently looks like this in both partners’
copies of our vampires-and-werewolves
repository:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
Let us add a line to the collaborator’s copy only:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
This line added to Wolfman's copy
and then push the change to GitLab:
OUTPUT
[main 5ae9631] Add a line in our home copy
1 file changed, 1 insertion(+)
OUTPUT
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 331 bytes | 331.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
29aba7c..dabb4c8 main -> main
Now let us have the owner make a different change to their copy without updating from GitLab:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We added a different line in the other copy
We can commit the change locally:
OUTPUT
[main 07ebc69] Add a line in my copy
1 file changed, 1 insertion(+)
but Git will not let us push it to GitLab:
OUTPUT
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'https://code.usgs.gov/vdracula/vampires-and-werewolves.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
Git rejects the push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is pull the changes from GitLab, merge them into the copy we are currently working in, and then push that. Let us start by pulling:
OUTPUT
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
* branch main -> FETCH_HEAD
29aba7c..dabb4c8 main -> origin/main
Auto-merging mars.txt
CONFLICT (content): Merge conflict in mars.txt
Automatic merge failed; fix conflicts and then commit the result.
The git pull
command updates the local repository to
include those changes already included in the remote repository. After
the changes from remote branch have been fetched, Git detects that
changes made to the local copy overlap with those made to the remote
repository, and therefore refuses to merge the two versions to stop us
from trampling on our previous work. The conflict is marked in in the
affected file:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
<<<<<<< HEAD
We added a different line in the other copy
=======
This line added to Wolfman's copy
>>>>>>> dabb4c8c450e8475aee9b14b4383acc99f42af1d
Our change is preceded by
<<<<<<< HEAD
. Git has then inserted
=======
as a separator between the conflicting changes and
marked the end of the content downloaded from GitLab with
>>>>>>>
. (The string of letters and
digits after that marker identifies the commit we have just
downloaded.)
It is now up to us to edit this file to remove these markers and reconcile the changes. We can do anything we want: keep the change made in the local repository, keep the change made in the remote repository, write something new to replace both, or get rid of the change entirely. Let us replace both so that the file looks like this:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We removed the conflict on this line
To finish merging, we add mars.txt
to the changes being
made by the merge and then commit:
OUTPUT
On branch main
All conflicts fixed but you are still merging.
(use "git commit" to conclude merge)
Changes to be committed:
modified: mars.txt
OUTPUT
[main 2abf2b1] Merge changes from GitLab
Now we can push our changes to GitLab:
OUTPUT
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 645 bytes | 645.00 KiB/s, done.
Total 6 (delta 4), reused 0 (delta 0)
remote: Resolving deltas: 100% (4/4), completed with 2 local objects.
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
dabb4c8..2abf2b1 main -> main
Git keeps track of what we have merged with what, so we do not have to fix things by hand again when the collaborator who made the first change pulls again:
OUTPUT
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 4), reused 6 (delta 4), pack-reused 0
Unpacking objects: 100% (6/6), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves
* branch main -> FETCH_HEAD
dabb4c8..2abf2b1 main -> origin/main
Updating dabb4c8..2abf2b1
Fast-forward
mars.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
We get the merged file:
OUTPUT
Cold, dry, and everything is red, vampires' favorite color
The two moons may be a problem for werewolves
Mummies will appreciate the lack of humidity
Why are we talking about mummies?
Two vampires and three werewolves were spotted on Mars.
We removed the conflict on this line
We do not need to merge again because Git knows someone has already done that.
Git’s ability to resolve conflicts is very useful, but conflict resolution costs time and effort, and can introduce errors if conflicts are not resolved correctly. If you find yourself resolving a lot of conflicts in a project, consider these technical approaches to reducing them:
- Pull from origin more frequently, especially before starting new work
- Use topic branches to segregate work, merging to main when complete
- Make smaller more atomic commits
- Push your work when it is done and encourage your team to do the same to reduce work in progress and, by extension, the chance of having conflicts
- Where logically appropriate, break large files into smaller ones so that it is less likely that two authors will alter the same file simultaneously
Conflicts can also be minimized with project management strategies:
- Clarify who is responsible for what areas with your collaborators
- Discuss what order tasks should be carried out in with your collaborators so that tasks expected to change the same lines will not be worked on simultaneously
- If the conflicts are stylistic churn (e.g. tabs vs. spaces),
establish a project convention that is governing and use code style
tools (e.g.
black
(Python),lintr
(R), etc.) to enforce, if necessary
Solving Conflicts that You Create
Clone the repository created by your instructor. Add a new file to it, and modify an existing file (your instructor will tell you which one). When asked by your instructor, pull their changes from the repository to create a conflict, then resolve it.
Conflicts on Non-textual files
What does Git do when there is a conflict in an image or some other non-textual file that is stored in version control?
Let us try it. Suppose Dracula takes a picture of Martian surface and
calls it mars.jpg
.
If you do not have an image file of Mars available, you can create a dummy binary file like this:
OUTPUT
-rw-r--r-- 1 vlad 57095 1.0K Mar 8 20:24 mars.jpg
ls
shows us that this created a 1-kilobyte file. It is
full of random bytes read from the special file,
/dev/urandom
.
Now, suppose Dracula adds mars.jpg
to his
repository:
OUTPUT
[main 8e4115c] Add picture of Martian surface
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 mars.jpg
Suppose that Wolfman has added a similar picture in the meantime. His
is a picture of the Martian sky, but it is also called
mars.jpg
. When Dracula tries to push, he gets a familiar
message:
OUTPUT
To https://code.usgs.gov/vdracula/vampires-and-werewolves.git
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'https://code.usgs.gov/vdracula/vampires-and-werewolves.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
We have learned that we must pull first and resolve any conflicts:
When there is a conflict on an image or other binary file, git prints a message like this:
OUTPUT
$ git pull origin main
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://code.usgs.gov/vdracula/vampires-and-werewolves.git
* branch main -> FETCH_HEAD
6a67967..439dc8c main -> origin/main
warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)
Auto-merging mars.jpg
CONFLICT (add/add): Merge conflict in mars.jpg
Automatic merge failed; fix conflicts and then commit the result.
The conflict message here is mostly the same as it was for
mars.txt
, but there is one key additional line:
OUTPUT
warning: Cannot merge binary files: mars.jpg (HEAD vs. 439dc8c08869c342438f6dc4a2b615b05b93c76e)
Git cannot automatically insert conflict markers into an image as it does for text files. So, instead of editing the image file, we must check out the version we want to keep. Then we can add and commit this version.
On the key line above, Git has conveniently given us commit
identifiers for the two versions of mars.jpg
. Our version
is HEAD
, and Wolfman’s version is 439dc8c0...
.
If we want to use our version, we can use git checkout
:
BASH
$ git checkout HEAD mars.jpg
$ git add mars.jpg
$ git commit -m "Use image of surface instead of sky"
OUTPUT
[main 21032c3] Use image of surface instead of sky
If instead we want to use Wolfman’s version, we can use
git checkout
with Wolfman’s commit identifier,
439dc8c0
:
BASH
$ git checkout 439dc8c0 mars.jpg
$ git add mars.jpg
$ git commit -m "Use image of sky instead of surface"
OUTPUT
[main da21b34] Use image of sky instead of surface
We can also keep both images. The catch is that we cannot keep them under the same name. But, we can check out each version in succession and rename it, then add the renamed versions. First, check out each image and rename it:
BASH
$ git checkout HEAD mars.jpg
$ git mv mars.jpg mars-surface.jpg
$ git checkout 439dc8c0 mars.jpg
$ mv mars.jpg mars-sky.jpg
Then, remove the old mars.jpg
and add the two new
files:
BASH
$ git rm mars.jpg
$ git add mars-surface.jpg
$ git add mars-sky.jpg
$ git commit -m "Use two images: surface and sky"
OUTPUT
[main 94ae08c] Use two images: surface and sky
2 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 mars-sky.jpg
rename mars.jpg => mars-surface.jpg (100%)
Now both images of Mars are checked into the repository, and
mars.jpg
no longer exists.
A Typical Work Session
You sit down at your computer to work on a shared project that is tracked in a remote Git repository. During your work session, you take the following actions, but not in this order:
-
Make changes by appending the number
100
to a text filenumbers.txt
- Update remote repository to match the local repository
- Celebrate your success with some fancy beverage(s)
- Update local repository to match the remote repository
- Stage changes to be committed
- Commit changes to the local repository
In what order should you perform these actions to minimize the chances of conflicts? Put the commands above in order in the action column of the table below. When you have the order right, see if you can write the corresponding commands in the command column. A few steps are populated to get you started.
order | action . . . . . . . . . . | command . . . . . . . . . . |
---|---|---|
1 | ||
2 | echo 100 >> numbers.txt |
|
3 | ||
4 | ||
5 | ||
6 | Celebrate! | AFK |
order | action . . . . . . | command . . . . . . . . . . . . . . . . . . . |
---|---|---|
1 | Update local | git pull origin main |
2 | Make changes | echo 100 >> numbers.txt |
3 | Stage changes | git add numbers.txt |
4 | Commit changes | git commit -m "Add 100 to numbers.txt" |
5 | Update remote | git push origin main |
6 | Celebrate! | AFK |
Key Points
- Conflicts occur when two or more people change the same lines of the same file.
- The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
Content from Open Science
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What is open science?
- How is open science valuable?
- How can version control help me make my work more open?
Objectives
- Define open science and be able to list attributes or processes that make a research project open.
- Explain why open science is valuable.
- Explain how a version control system can be leveraged as an electronic lab notebook for computational work.
In 2023, the U.S. government declared a Year of Open Science and defined open science for federal agencies:
“Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.”
But what does this mean in practice? NASA is one agency leading the way in developing a culture of open science with their Transform to Open Science (TOPS) program, including the publication of an Open Science 101 curriculum. Here at USGS, we can practice open science by releasing scientific code with Git version control via a USGS software information product.
Check Out How USGS Celebrated The Year Of Open Science!
Check out the USGS Year of Open Science webpage to learn about the Community for Data Integration’s (CDI) ‘Open Data for Open Science’ workshop and other USGS open science stories.
Let us take a step back. How is open science valuable and how does publishing your code make your research more open?
Making Code Citable
All USGS software information products are citable with a unique Digital Object Identifier (DOI). You will learn how to create the citation and DOI in the later episode on Citation.
Unless your methods are restricted to a single mathematical operation, it is very difficult to make your research fully reproducible without the code used to analyze and generate results. Sharing the analysis code can significantly increase the reproducibility of published papers (Ince et al. 2012, Laurinavichyute et al. 2022). Additionally, open science practices can lead to more citations, potential collaborators, and funding opportunities (McKiernan et al. 2016). This open model accelerates discovery: the more open work is, the more widely it is cited and re-used (Piwowar et al. 2007).
Researchers are also exploring how the FAIR (Findable, Accessible, Interoperable, and Reusable) data standards can apply to research software. Check out the FAIR Principles for Research Software to learn more.
Are you worried that your code is too messy to share? Fear not: here is an open letter from a professional software engineer telling you that it is good enough. In fact, “if your code is good enough to do the job, then it is good enough to release”.
Is My Work Reproducible?
When analysis is conducted using scientific code, domain and code reviews can help to determine reproducibility (and therefore the accuracy and validity) of the results. You will learn more about these types of reviews in a later episode.
However, people who want to work this way may have some questions about how to approach publishing the code. This is one of the (many) reasons we teach version control. When used diligently, version control with Git acts as a shareable electronic lab notebook for computational work:
- The conceptual stages of your work are documented, including who did what and when. Every step is stamped with an identifier (the commit ID) that is for most intents and purposes unique.
- You can tie documentation of rationale, ideas, and other intellectual work directly to the changes that spring from them.
- You can refer to what you used in your research to obtain your computational results in a way that is unique and recoverable.
- With a version control system such as Git, the entire history of the repository is easy to archive for perpetuity.
Challenge: Is There An Advantage To Publishing Scientific Code Using Version Control Software?
Publishing your scientific code as a git repository is more open or valuable than publishing it as part of a data release. TRUE or FALSE?
True. The advantages of publishing your scripts in a Git repository include:
- Publishing the history of changes. This keeps a record of what methods were explored, prior versions and approaches, and what did not work well.
- Keeping track of who authored what. Tracking helps authors receive credit for the work accomplished.
- Providing an easy way to correct errors or make updates as new information becomes available.
- Simplifying how others can access and use your code. Anyone can clone your repository and immediately start using your code.
Key Points
- Open scientific work is more useful and more highly cited than closed
- Publishing code is a critical part of making science reproducible
- If your code is good enough to produce scientific results, then it is good enough to publish
Content from Policy
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What is an official USGS software information product?
- When am I required to release my software as an official USGS software information product?
- When may I release my software as an official USGS software information product?
Objectives
- Identify the difference between a software project and an official USGS software information product.
- Explain requirements for releasing software as an official USGS software information product.
- Identify the policy hierarchy relationship among federal, agency, and USGS authorities.
Computer commands written in a computer programming language that are meant to be read by people. As such, source code is a higher-level representation of computer commands and, therefore, must be assembled, interpreted, or compiled before a computer can execute it as a program.
Example
The above is an example of a file called “hello.cpp” that contains source code written in the C++ programming language. While the source code is relatively easily understood by a human, a computer is not able to execute this file directly.
BASH
$ ./hello.cpp
./hello.cpp: line 3: syntax error near unexpected token `('
./hello.cpp: line 3: `int main() {'
Instead this file must be compiled to an executable using a command similar to the following:
The resulting file, “hello”, contains binary machine code that can be executed by the computer.
While C++ is an explicitly compiled language, other languages are more sublte and may leverage just-in-time compilation or interpretation of the source code. In these more subtle languages there may not be an explicit compilation command. Only the source code file exists and it is quietly compiled or interpreted behind the scenes upon execution.
Examples of these more subtle languages include Python or Shell scripts.
Source code developed by- or on behalf of- the U.S. Geological Survey that is- or intends to be- publicly accessible must be stored within a Git repository on the USGS Git Hosting Platform. This Git repository may have multiple branches, tags, and commits. There may exist issue trackers, build artifacts, milestones etc. Taken together, the activities and artifacts related to the prior, ongoing, or upcoming development activities of source code are considered a “software project”.
Official Software Information Product
When a software project reaches some level of maturity (e.g., results are used to support a published manuscript), it must be released as an “official USGS software information product”. While software projects may not be cited by other official USGS information product types (e.g., data releases, journal articles, etc.), official USGS software information products are citable. The desire to cite a software project is one example requiring the author to release the project as an official USGS software information product; however local policies (e.g., science center or equivalent organizational unit) may also define additional criteria requiring the author to release the project as an official USGS software information product.
An official USGS software information product reflects a point-in-time snapshot of a software project’s source code and relevant artifacts. This snapshot must be reviewed and receive appropriate approval to be made public. This snapshot is typically created using a Git tag in the repository and an associated GitLab release.
Open Source Software Project Development
A software project may be developed publicly as an open-source software project given the project complies with all governing policies. Subject to limited exceptions, current policy requires a software project must be made open-source when specific criteria are met, for example:
- The project, or results thereof, are deemed sufficient to be used by the current or future research project(s)
- A project that was contracted through a service contract vehicle is accepted by the federal contracting authority to satisfy contract requirements
- The source code in the project is no longer considered truly exploratory or disposable in nature
- The library or application produced by the software project is used by USGS or other federal staff on a regular, recurring basis
- The library or application produces actionable information at scales and timeframes relevant to decision makers
When developing an open-source project, all contributions to the project must receive, at minimum, an administrative security review before the contribution is integrated into the project repository. Depending on the project, this review provides an opportunity to complete other types of review as well, e.g., technical code review.
Branching vs Forking Workflows
In this course we describe a branching workflow. This workflow is simpler for individual developers to understand when getting started with Git. However a forking workflow may be better suited for open source project development.
Current policy requires all contributions to open source projects be reviewed before they are integrated with the public project. Since all branches in the public project are themselves public, there is no way to use a branching workflow and comply with current policy. Under a branching workflow, the author must determine a method for sharing and reviewing their code prior to pushing their changes to GitLab; this may be fragile or lead to unversioned changes.
Conversely, a forking workflow enables reviews to occur by way of merge requests from the internal fork repository location to the public upstream repository location prior to integrating said contributions with the public project repository. In this way open source development may continue collaboratively while adhering to current policy requirements.
Governing policies (see below) determine the requirements for both open development practices and release of official USGS software information products, which are determined at each of Federal, Agency (Department of the Interior), Bureau (U.S. Geological Survey), and local (e.g., science center or equivalent organizational unit) levels.
In general, the policies are structured in a hierarchy such that higher level policy (e.g., federal policy) provides generalized guidance and lower level policy provides increasing specificity and clarity. Lower level policy may not supersede or conflict with higher level policy.
Challenge: Identifying relevant policies
Select from which source(s) there exist policies governing the release of official USGS software information products.
A. Federal
B. Departmental
C. Bureau
D. Local
Policies are known to exist for each of (A), (B), and (C) sources. Science Centers and offices may implement additional policies with which you must comply.
Requirements
The following are required to release a software project as an official USGS software information product.
- Proper license, disclaimer, and metadata (
code.json
file) - Appropriate review(s) and approval as defined by current policy
- An approved Information Product Data System (IPDS) record
- A Git tag within the project corresponding to the official USGS software information product and a GitLab release corresponding to the Git tag
- A digital object identifier (DOI)
Other episodes in this lesson detail procedures to satisfy each of the preceding requirements.
Key Points
Software may be publicly accessible as an open source software project and/or as an official USGS software information product.
While both a project and product may be public, only the official USGS software information product is citable by other publications.
Governing policies cascade from Federal to local levels. Check with your supervisor to ensure compliance with all local policies.
Content from Licensing
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What licensing information should I include with my work?
Objectives
- Explain why adding licensing information to a repository is important.
- Choose a proper license.
Under U.S. copyright law, copyright protection automatically arises in original creative works that are fixed in any tangible medium of expression (e.g., a written work on paper, an audio/visual recording on tape, a sculptured work out of marble). However an original work of the U.S. Government is not eligible for copyright protection in the United States (17 USC 105a). This restriction means that as USGS employees, any original work that we create in the course of our official duties and responsibilities are automatically in the public domain.
DOI solicitor note 1
Depending on the jurisdiction, the U.S. Government may have foreign copyright protections in U.S. Government work. Further, 17 USC 105a does not prevent the U.S. Government from owning copyright (e.g., if a USGS contractor creates an original creative work under an agreement, copyright arises in the work to the contractor, and the USGS may obtain ownership of the copyright through a contract).
Instead, all software developed by the USGS should include a
LICENSE.md
to notify the public of the copyright status of
the software. Why add a LICENSE.md
at all? If we do not
include a license clarifying that we have waived the copyright - thus
making this fact explicit - the uncertainty (is there a copyright, is
there not?) in the mind of a potential user could inhibit potential
usage of said work, thus reducing its impact and value. When someone
reuses a creative work without a license, the author of that work could
sue for copyright infringement. A license solves this problem by
explicitly granting rights to others (the licensees) that they would
otherwise not have (or not know that they have).
What licenses have I already accepted?
Many of the software tools we use on a daily basis (including in this
workshop) are released as open-source software. Pick a project on GitHub
from the list below, or one of your own choosing. Find its license
(usually in a file called LICENSE
or COPYING
)
and talk about how it restricts your use of the software.
- Git\(^1\), the source-code management tool
- CPython\(^1\), the standard implementation of the Python language
- Jupyter\(^1\), the project behind web-based Python notebooks
- R software\(^1\), read-only mirror of the R software source code
Both R software and Git use the GNU General Public License, which is one of the most commonly used series of software license for free and open-source software. One way in which it differs from the CC0 Public Domain license (more detail on that below) is that it specifies all derivative work must be distributed under the same or equivalent license terms, which is important for keeping open software open. In other words, an open-source license such as the GNU GPL series of licenses, differs greatly from the CC0 in that the former places certain restrictions on the use, copying, and redistribution of the software, while the latter places no restrictions whatsoever.
What rights are being granted under which conditions differs, often
only slightly, from one license to another. The Creative Commons Public
Domain Dedication (CC0) is the most commonly used ‘license’ at USGS
(currently CC0
1.0\(^1\)). It assumes the software
is either completely original or using other software also with the CC0
license. This license places the work as completely as possible in the
public domain so that it is free for others to build upon, enhance, or
reuse. It should work for most USGS software, assuming that it was
developed solely by federal employees and does not include any software
developed by others that is not publicly dedicated. The text for this
license is included in a callout box below. You can add a
LICENSE.md
file in your project root repository and copy
and paste the text below.
DOI solicitor note 2
If the USGS wishes to release software originally created by a federal contractor, it may either:
- require the contractor to release the software under a CC0 public domain dedication, or
- require the contractor to assign all intellectual property rights, title, and interest in the software to USGS
and then release the software under a CC0 public domain dedication.
For contractor positions that work closely alongside federal positions, please refer to the Contracting Officer for questions concerning how code sharing is addressed in the contract.
Note that you can also use this Copyright Dedication Agreement to formally place materials in the public domain.
When your product includes work under copyright:
If your code includes code developed by others, you will need to consider the license of the code developed by others. Any code that is used in USGS software products must be used with permission from the copyright holder or in accordance with the license, and should be marked as such. Do not assume that you can release such code under a CC0 public domain dedication.
If you have any questions or concerns regarding which license to use, please reach out to the DOI solicitor’s office for guidance.
CC0 1.0 license text
MARKDOWN
# License
Unless otherwise noted, this project work is in the public domain in the United
States because it is a work of the United States Geological Survey, an agency
of the United States Department of Interior. For more information, see the
official USGS copyright policy at
https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits
Additionally, the USGS waives all copyright and related rights in the work
worldwide through the CC0 1.0 Universal public domain dedication.
## CC0 1.0 Universal Summary
This is a human-readable summary of the
[Legal Code (read the full text)][1].
### No Copyright
The person or entity who associated a work with this deed has dedicated the
work to the public domain by waiving all of his or her rights to the work
worldwide under copyright law, including all related and neighboring rights,
to the extent allowed by law.
You can copy, modify, distribute and perform the work, even for commercial
purposes, all without asking permission.
### Other Information
In no way are the patent or trademark rights of any person affected by CC0,
nor are the rights that other persons may have in the work or in how the
work is used, such as publicity or privacy rights.
Unless expressly stated otherwise, the person who associated a work with
this deed makes no warranties about the work, and disclaims liability for
all uses of the work, to the fullest extent permitted by applicable law.
When using or citing the work, you should not imply endorsement by the
author or the affirmer.
[1]: https://creativecommons.org/publicdomain/zero/1.0/legalcode
Key Points
- A
LICENSE
file is often used in a repository to indicate how the contents of the repo may be used by others. - USGS software products require a
LICENSE.md
file in the project root of your repository. - Non-derivative USGS software products can use the CC0 1.0 license.
- If you need a different license, consult the solicitor’s office to determine the appropriate license.
1: non-Federal link
Content from Citation
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I create a digital object identifier (DOI)?
- How can I make my work easy to cite?
Objectives
- Learn how to create a digital object identifier (DOI).
- Make your work easy to cite.
All USGS Software Information Products are required to have a digital object identifier (DOI) assigned to them. A DOI is persistent identifier tied to a unique object that you specify. USGS uses the Asset Identifier Service to reserve and manage DOIs for software products. You can reserve a DOI by providing the Title of your Software Product and the USGS Science Center or Program responsible for the software project. Remember not to activate/publish the DOI to DataCite until you have received official approval to release the software product. You will learn more about activating/publishing the DOI in a later episode.
Once you have the DOI, you can write a suggested citation by including the reserved DOI that you receive from the Asset Identifier Service in your citation as a full URL (e.g., https://doi.org/10.5066/xxxxxxxx). For example:
Dracula, V. and Wolfman, L.T., 2024, Vampires and Werewolves, version 1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/xxxxxxxx.
You can place the suggested citation in the README.md
file in your root directory, making it easy to find.
Try adding a CITATION.md file
Although not required for a USGS Software Information Product, you
may want to consider adding a CITATION.md
file that
describes how to reference or cite your project. You can include a plain
text version of the citation that’s easy to copy and paste as well as a
BibTex entry.
Here’s an example of what Dracula would write in his
CITATION.md
file:
To reference the Vampire and Werewolves software product in a publication, you can cite:
Dracula, V. and Wolfman, L.T., 2024, Vampires and Werewolves, version 1.0.0, U.S. Geological Survey software release, https://doi.org/10.5066/xxxxxxxx.
@software{dracula-vampires-werewolves-2024,
author = {Dracula, Vlad AND Wolfman, L.T.},
title = {Vampire and Werewolves},
version = {1.0.0},
year = {2024},
doi = {10.5066/xxxxxxxx}
}
The second part of that documentation is a Bibtex entry, which can be
ingested by some bibliography software. If there is an associated
publication, you can add that to CITATION.md
too.
To explore this topic in more detail, check out the Software Sustainability Institute blog or the FORCE11 Software Citation Group’s citation principles.
Key Points
- Create a DOI for your software information product.
- Add a suggested citation to your repository.
Content from Commonly Included Files
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What are some files which are usually included in USGS software projects, and what should their content be?
Objectives
- Draw awareness to common “boilerplate” files which are usually included in software products.
- Provide usable examples of each of these.
Disclaimers
All USGS software information products must contain appropriate disclaimers. This is unique among the files discussed here. While the others are strongly recommended, they are not required by Fundamental Science Practices (FSP).
The location of the disclaimer must be given as part of the
code.json
metadata which accompanies USGS software
information products (see episode Creating Metadata).
The disclaimer used for open-source software projects must be different from the one used for official USGS software information Products.
Provisional disclaimers
The provisional disclaimer must remain in any branch or tag which does not represent an official USGS software information product. The official disclaimer may only be used in tags (or temporarily in release-candidate branches working towards a tag) that represent Official USGS Software Information Products.
For more information on this, see the Reviews for Authors lesson on preparing the release branch and the Publishing lesson on managing tags.
Open-source software projects
For an open-source software project, appropriate content for the
DISCLAIMER.md
may be found in section 11 of the FSP
Guidance on Disclaimer Statements Allowed in USGS Science Information
Products:
Official USGS Software Information Product
For an official USGS software information product, appropriate
content for the DISCLAIMER.md
may be found in section 5 of
the FSP
Guidance on Disclaimer Statements Allowed in USGS Science Information
Products:
Readme
When included, a README.md
file will be rendered to text
on the GitLab page of its project. This file should give a
human-readable description of the project. It can also contain details
about how to use the project. It should also give pointers to other
relevant information about the project, when that information is
contained in other files. For example, there might be a “Contributing to
this project” section which points to a CONTRIBUTING.md
file, or an R library’s README.md
might point users to that
library’s vignettes.
For some examples of effective README files in USGS projects, see
- the dataRetrieval R package (also available as a nicely rendered GitLab Pages site)
- the ISIS3 software
- the EGRET R package (also available as a nicely rendered GitLab Pages site)
Contributing
If you are willing to accept contributions from outside your team,
you can include a CONTRIBUTING.md
which explains your
project’s policies and procedures for doing so. An example is below.
Customize the example for your project
Before using the example below, you would need to change [1] and [4] to appropriate URLs from your package repository, and choose appropriate URLs for [2] and [3] based on whether your project is on GitHub or GitLab.
MARKDOWN
Contributing
============
Contributions are welcome from the community. Questions can be asked on the
[issues page][1]. Before creating a new issue, please take a moment to
search and make sure a similar issue does not already exist. If one does
exist, you can comment (most simply even with just a `:+1:`) to show your
support for that issue.
If you have direct contributions you would like considered for incorporation
into the project you can [fork this
repository](https://docs.gitlab.com/ee/user/project/repository/forking_workflow.html#create-a-fork)
and [submit a merge
request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html#when-you-work-in-a-fork)
for review. Please note that all contributions will be considered public domain
(see [license][2] for details).
[1]: Replace this text with the URL for your project's issues page
[2]: Replace this text with the URL for your project's license
Code of Conduct
This is a file, typically called CODE_OF_CONDUCT.md
,
that describes expected conduct from users contributing to the project.
At a minimum this file must specify that all contributions to the
project must abide by the USGS Code of Scientific Conduct. It is also
appropriate for it to include further language specifying expectations
for contributors’ behavior as part of the project’s community. A
suitable example of such a file’s contents follows:
MARKDOWN
# Contributor Code of Conduct
All contributions to- and interactions surrounding- this project will abide
by the [USGS Code of Scientific Conduct][1].
[1]: https://www2.usgs.gov/fsp/fsp_code_of_scientific_conduct.asp
We are committed to making participation in this project a harassment-free
experience for everyone, regardless of level of experience, gender, gender
identity and expression, sexual orientation, disability, personal
appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual
language or imagery, derogatory comments or personal attacks, trolling,
public or private harassment, insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct. Project maintainers who do
not follow the Code of Conduct may be removed from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by opening an issue or contacting one or more of the project
maintainers.
This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0
Which file must be included in all USGS software repositories?
There are conventions for files included in software repositories that explain the purpose of the repository or how its team works. Many of these are recommended but optional. One, however, is mandatory. Which file is mandatory, and why?
The DISCLAIMER.md
is mandatory in published USGS
software repositories, because it is required
by FSP.
Key Points
- USGS software products typically contain “boilerplate” files.
- Some of these files, like the
DISCLAIMER.md
, are mandatory and must be included in all USGS software products. Others are optional. - Examples of these files may be found in existing projects, or on this page.
Content from Creating Metadata
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What is a
code.json
file? - How do you create a
code.json
file? - What are the required fields in the
code.json
file for a USGS software project?
Objectives
- Explain what a
code.json
file is and how it is used. - Create a
code.json
file with the minimum required fields for a USGS software project and software information products. - Validate a
code.json
file. - Update a
code.json
file for a new version of the software.
Introduction
Metadata are descriptive elements in a standardized format that are necessary for identification, discovery, access, and use of information products such as software and data. Metadata answer fundamental questions such as who, what, when, where, why, and how.
Metadata for a software project are stored and maintained in a file
called code.json
located at the top-level of the project
repository in GitLab. This code.json
file is in JavaScript Object Notation (JSON)
format. The code.json
file provides basic information about
the project and official software information products and will be
aggregated with the information from other Department of the Interior
software projects to form the Departmental Enterprise Code Inventory,
which is required by the Federal Source Code Policy. The
code.json
file is required for software
information products but may be created for projects
without official software information products.
JSON Overview
The JSON data format allows machine-to-machine communication with structured text. JSON is language agnostic.
JSON Syntax:
-
Use key/value pairs
- keys are strings, indicated by double quotes
- values can be:
- strings (
"Vlad Dracula"
), - numbers (
1.5
), - objects (
{"key": "value", "key2": "value2"}
), - arrays (
[lists]
), - boolean (
true
/false
), or null
- strings (
- separate keys from values with a colon
- Format:
"key": "value"
- Example:
"name": "Vlad Dracula"
- Format:
-
Separate key/value pairs with commas:
Generally, a JSON file will contain an object or an array. If it is
an object, it will start and end with curly brackets {}
. If
it is an array, it will start and end with square brackets
[]
.
Let us create a JSON file in our GitLab project space with the
filename hello-world.json
.

In the web browser, add the following content to
hello-world.json
:
Notice that in our example, our JSON represents an object since it starts with curly brackets. Also, notice that the GitLab web editor provides some highlighting and indentation assistance similar to what a desktop editor might provide.
You can use a JSON Validator like JSON Formatter & Validator to format and check your JSON.
Let us try adding a trailing comma in our JSON and validating it:
The JSON Formatter & Validator will tell you what it found wrong and attempt to fix it for you:
OUTPUT
Info: Removed trailing comma.
Metadata Template
USGS provides a code.json
template (see below) to help
you get started writing project metadata. Notice that its top-level
element is an array, which is designated by the square brackets.
JSON
[
{
"name": "REPOSITORY_NAME",
"organization": "U.S. Geological Survey",
"description": "REPOSITORY_DESCRIPTION",
"version": "RELEASE_VERSION",
"status": "RELEASE_STATUS",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME",
"downloadURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/archive/RELEASE_VERSION/REPOSITORY_NAME-RELEASE_VERSION.zip",
"disclaimerURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/raw/RELEASE_VERSION/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME.git",
"vcs": "git",
"laborHours": 0,
"tags": [
"TOPIC_TAG_1",
"TOPIC_TAG_2"
],
"languages": [
"PROGRAMMING_LANG_1",
"PROGRAMMING_LANG_2"
],
"contact": {
"name": "REPOSITORY_ADMINISTRATOR_NAME",
"email": "REPOSITORY_ADMINISTRATOR_EMAIL"
},
"date": {
"metadataLastUpdated": "YYYY-MM-DD"
}
}
]
Create a Metadata File in GitLab
Create a new code.json
file at the top level of your
GitLab repository:

Paste the template JSON into the file, add a commit message, and
click Commit changes
:

Add Project-Specific Information
Now, edit the code.json
file to include project-specific
information. While viewing the code.json
file in GitLab,
click Edit
and Edit single file
:

Replace the ALL_CAPS placeholders with meaningful values for the project. For the purposes of this exercise, the project includes code for modeling the co-occurrence of Vampires and Werewolves on Mars. The project team is actively developing the code. Eventually, they will release a USGS software information product in the public domain. This particular metadata object will document the entire project as opposed to a single product, so use “main” as the version. The project uses machine learning / artificial intelligence techniques and the code is written in Python.
GROUP_HIERARCHY is the group name under which your project is nested
in GitLab. The GROUP_HIERARCHY may be one level if you are working out
of a personal space (e.g., vdracular
) or it may be a nested
hierarchy (e.g., ecosystems/FRESC
).
Below are the field definitions for code.json
and
examples of how the template can be updated:
-
name
: Should be a short, human readable name for the project. This should match the value provided when creating the project in GitLab. The best practice is to use lowercase words with hyphens separating them.
-
organization
: Must always be"U.S. Geological Survey"
; casing and punctuation are important. No updates are needed to the template.
-
description
: This may be a longer description of the project. It should be no more than 1-2 sentences. Verbose descriptions may exist in theREADME.md
file.
-
version
: This should be a semantic version number for the product (e.g.,1.0.0
) or the DEFAULT_BRANCH name (e.g.,main
ormaster
) depending on whether the metadata object is referencing the project or an information product. The version number should not include a leadingv
(i.e.,v1.0.0
) or other identifier. A Git branch (release candidate branch) must exist with the same name (e.g.,1.0.0
) during the review process. Upon publication, the version branch is converted to a tag. (We will discuss more about release tags in a future episode).
-
status
: Must be one of the enumerated values listed below. There are no official definitions for these terms in code.gov; however, Wikipedia provides some good definitions, which are paraphrased below.-
Ideation
: planning phase of a software project. -
Development
: work on software project prior to formal testing. -
Alpha
: initial testing phase, often done within the project team or organization. -
Beta
: feature complete testing phase that follows Alpha testing, often available to users outside project team or organization. -
Release Candidate
: a Beta version with the potential to be ready for production. In USGS, a release candidate would be going through formal review and approval. -
Production
: the product has passed all stages of testing. In USGS, a production release has been reviewed and approved. -
Archival
: a version of the software that is no longer supported.
-
-
permissions
-
usageType
: A list of enumerated values which describes the usage permissions for the release:- openSource: Open source
- governmentWideReuse: Government-wide reuse
- exemptByLaw: The sharing of the source code is restricted by law or regulation, including—but not limited to—patent or intellectual property law, the Export Asset Regulations, the International Traffic in Arms Regulation, and the Federal laws and regulations governing classified information
- exemptByNationalSecurity: The sharing of the source code would create an identifiable risk to the detriment of national security, confidentiality of Government information, or individual privacy
- exemptByAgencySystem: The sharing of the source code would create an identifiable risk to the stability, security, or integrity of the agency’s systems or personnel
- exemptByAgencyMission: The sharing of the source code would create an identifiable risk to agency mission, programs, or operations
- exemptByCIO: The CIO believes it is in the national interest to exempt sharing the source code
- exemptByPolicyDate: The release was created prior to the M-16-21 policy (August 8, 2016)
-
license
-
name
: The name of the license under which the product is released (e.g.,Public Domain, CC0-1.0
). In most cases, the appropriate license for USGS products isPublic Domain, CC0-1.0
, but sometimes (e.g., when some of the code is from outside sources or collaborators) different licenses are required. For more information on selecting an appropriate license see the Licensing episode in this Lesson. -
URL
: A link to theLICENSE.md
file stored in this project- Must reference the
main
ormaster
branch (this will differ for an official product, which should point to the immutable tagged version) - Must use the
raw
variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text. To get theraw
variant of a file, click into the file, and click theOpen raw
button next to theDownload
button:
- Must reference the
-
-
JSON
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
}
]
}
-
homepageURL
*: A link to the project homepage- May point to the project on GitLab, but will not include the
.git
extension - May point to a project home page elsewhere as long as it is publicly accessible (or soon-to-be publicly accessible, once you have gone through the release process) and in an approved location (e.g., usgs.gov webpage as opposed to a personal website)
- May point to the project on GitLab, but will not include the
-
downloadURL
: A link to download a ZIP archive of the project source code- Must point to the
main
ormaster
branch (this will differ for an official product, which should point to the immutable tagged version) - In GitLab, you can get the download URL by selecting
Code
–> right clickzip
(underDownload source code
) –>Copy Link
:
- Must point to the
JSON
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip"
-
disclaimerURL
: A link to theDISCLAIMER.md
file stored in this project- Must use the
raw
variant of the file, which provides access to the plain text of the file and not the GitLab-formatted text - Must point to the
main
ormaster
branch (this will differ for an official product, which should point to the immutable tagged version)
- Must use the
JSON
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md"
-
repositoryURL
*: A link to this project on GitLab- Must include the
.git
extension
- Must include the
*Note: homepageURL
and repositoryURL
are
different. repositoryURL
should end with .git
whereas the homepageURL
should not.
-
vcs
: A lowercase string with the name of the version control system that is being used. For USGS, this will begit
. No updates are needed to the template.
-
laborHours
: An estimate of total labor hours spent by your organization across the current version and all previous versions, including labor performed by federal employees and contractors. Labor hours are cumulative across all versions. Your best guess is fine. If not known, the recommendation is to use-1
.
-
tags
: An array of topical/domain tags relevant to the project- Consider using the USGS Thesaurus or other controlled vocabularies to improve browse functionality in the code inventory.
- These tags can be used to help people narrow down searches for software, so consider terms that will help direct potential users to your project
- If the project supports AI/ML research and development, this array
must include the tag
usg-artificial-intelligence
. This tag is short forU.S. Government Artificial Intelligence
(i.e., do not use “usgs-artificial-intelligence”).
-
languages
: An array of the programming languages used within this project (e.g., “Python”, “R”, “C++”). There is not a controlled vocabulary, so use your best judgement on how to represent the programming languages in your project.
-
contact
: Point of contact information for the software information product.
-
date
-
metadataLastUpdated
: An ISO datestamp (YYYY-MM-DD) of when the metadata item within thecode.json
file was last modified. Be sure to update this value whenever you modify any of the other key/value pairs for this metadata item. Note that you must use two digits for month and day (e.g., 2024-8-9 is not correct).
-
Personal Space in GitLab
In the examples above, the URLs that we are generating reference Vlad Dracula’s or your own personal GitLab space. In reality, you cannot make a repository public that is located under a personal username. Instead, public repositories need to be located under a public group. The current recommendation is to have groups at the USGS Mission Area level (e.g., Ecosystems) and then subgroups at the USGS Science Center level. Project repositories will then be located within the Science Center subgroup. To avoid needing to rename all of your URLs, it is a best practice to start projects within these public groups and maintain more restrictive permissions at the project level.
This is what the full code.json
file should look like
after making the updates above:
JSON
[
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "main",
"status": "Development",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 0,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-05-29"
}
}
]
Challenge
Use JSON Formatter & Validator to format and check your JSON. What errors were present in your JSON? Note that this tool only validates against the JSON syntax and does not validate against the code.gov metadata schema.
Additional code.json
fields
Additional fields are also available. See the official
code.gov metadata schema for additional details. Note that you
should only add fields from the “releases” array within this schema. The
full code.gov metadata schema includes other fields that are necessary
for building the Enterprise Code Inventory, but those should not be
included in the individual project code.json
files. Fields
that are not documented in the official code.gov metadata schema
cannot be included in the code.json
files.
Updating Metadata for Initial Software Information Product
Remember that the top-level element in code.json
file is
an array. This means it may contain more than one object for
your project. The recommended practice is to order metadata objects with
the DEFAULT_BRANCH (e.g., main) appearing first, followed by the most
recently released version. For an initial software information product
release, it would look something like this:
Metadata evolve over time. There is some confusion where people think the metadata in the main branch should be for the main branch code only and not for any other branches. The reality is the metadata in the DEFAULT_BRANCH (e.g., main) should contain metadata for each version of the project (official or otherwise). The metadata in the tags associated with a specific version should contain metadata for the current version and all preceding versions; in this way, it will match the metadata in the main branch at the time the version is created.
Releasing an Initial Software Information Product
You are ready to release an initial version of your software
information product. In the code.json
file, copy the text
for the main branch’s release object and paste it directly below in the
code.json
array (you will use the main branch release
object as a type of template to make further changes). You will need to
add a comma between the two objects after the closing }
for
the first object. In the second object, update the status
field to Production
. Additionally, update the
URL
fields in the second object to use 1.0.0
(or whatever version number you are using; it is not required to use
1.0.0
) instead of main
in the
RELEASE_VERSION
section of the URL. You will also need to
update the laborHours
and the
metadataLastUpdated
fields.
JSON
[
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "main",
"status": "Development",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 200,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-06-15"
}
},
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "1.0.0",
"status": "Production",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/1.0.0/vampires-and-werewolves-1.0.0.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 200,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-06-15"
}
}
]
The version of the code.json
file that was created in
the exercise above will be included in the 1.0.0
branch,
once the branch is created, and ultimately the immutable tagged product,
as well as in the main
branch.
Note about Status Field
There are no set rules for what status needs to be assigned to a
given version or branch of a project. The goal is to do the best to
communicate to users how thoroughly particular code has been tested,
reviewed, and approved, and how you might anticipate them using the
project and products. For example, if you have testing, reviews, and
approvals built into your development process such that the
main
branch is always the latest and greatest and should be
the go-to code to use, then the main
branch might be
labeled with a status of ‘Production’. If instead the content in the
main
branch is not formally approved until a release branch
is created, then the main
branch might maintain a status of
‘Development’ to encourage users to use the most recent formal
version.
Likewise, if a previous version of a product is still relevant and
usable, it may continue to have a status
label of
‘Production’. If, however, the newer version corrects some bugs and
should be used instead of a previous version, then, the previous version
should have its status
updated to ‘Archival’.
Key Points
- A
code.json
file is a file formatted in JavaScript Object Notation (JSON) and contains project metadata. Thecode.json
file is saved at the top-level of the project. - USGS compiles all of the
code.json
files for public products in GitLab into an inventory that is required by Federal policy. - You can use the
code.json
file template above to begin creating your project and product metadata with the required fields.
Content from Software Review for Reviewers
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- What are my responsibilities as a reviewer of software?
- How do I conduct a software review?
Objectives
- Explain the topics that need to be covered during review.
- Conduct a software review.
Software Review Overview
All USGS open-source software projects must undergo an administrative review. Official USGS software products must undergo two additional forms of review: Technical code review and scientific (domain) review. For an official overview, see Types of Software Review.
Administrative Review
The administrative reviewer’s duty is to make sure that the entire history of the project is free of potential security or privacy violations. There are several types of information which must not be present in code released to the public:
- Personally identifiable information (PII)
- Absolute file system paths
- Internal server host names or IP addresses
- Usernames or passwords
This review must be done for every commit in the released software. This can be a very onerous requirement if it is to be done all at once. For that reason, collaborative workflows where changes to the codebase go through merge requests are a very good way of making sure that the administrative review has been adequately done.
It is acceptable for team members to review each others’ contributions, even if they are both listed as authors of the software. Reviewing a merge request is done by people who are not authors on that specific code; they are only “authors” of the project generally. Therefore, mutual in-team reviews are a convenient way to comply with this requirement.
Technical code review
The technical code review focuses on such concerns as adherence to coding standards and other measures of code quality. This review is required for all official USGS software products, but not for provisional products. Unlike administrative review, this does not need to be done for every commit of the released software.
Typical focuses in technical code review include
- checking for adherence to explicit coding standards, such as conventions for naming variables and functions
- ensuring that unit tests pass
- inspecting for vulnerabilities or bugs
Some of these areas of concern are amenable to automation. For instance, linter software can test for adherence to coding standards, calculate measures of code complexity, and identify common bug-prone patterns.
Scientific (domain) review
Scientific software requires a domain review as well. Like the technical code review, the domain review only needs to be done on the end product, not on individual commits. What constitutes an appropriate domain review will vary a great deal depending on the domain, but generally involves checking for scientific flaws or errors. It is similar to a peer review for a scientific publication: checking that the methods are applied correctly and are appropriate for the scientific question. You can leverage community resources, such as CDI or your local colleagues, for insight into scientific reviews in your domain.
How To Make The Review Streamlined
The person requesting the review may have already set up a way for you to do the review. For instance, if following the instructions in Review for Authors, they will have created a GitLab issue where you can conduct your review.
There are many possible ways to do reviews, including methods such as Word documents which do not involve git at all. But if the reviewer has not specified how to conduct the review, one good way to do it is to create your own GitLab issue. Using either the issue title or a label, make clear what kind of review you are doing, and what version or tag of the software it pertains to. Now you can put your comments, requests for changes, or approval into this issue. This keeps everything tied into the code repository, rather than in email or Teams chats, where it could get lost.
Adding Comments to Your Review
Let us try adding a comment to a review issue as a reviewer and as an author.
Reviewer Role:
- Navigate to Code -> Commits
- Select release branch 1.0.0
- Copy commit SHA
- Start new comment in review issue: “Starting review as of commit [paste commit SHA]
- Click
Comment
- In the example exercise here, you start your review with the metadata file first and note that a date needs to be updated. Navigate to code.json file in branch 1.0.0
- Right click on the line number next to ‘metadataLastUpdated’ and
select
Copy Link
- In a new comment, paste link into review issue and add a comment (e.g., “Make sure to update the metadataLastUpdated date before you submit for publication”)
- Select the dropdown next to
Comment
and chooseStart thread
and then click the buttonStart thread
Author Role:
For reconciling a review, you can create a branch to address all comments. In this exercise, we just have one comment to address:
- Create a feature branch to address the comment
- Open editor and make the change
- Stage, commit, and push the changes
BASH
git add code.json
git commit -m "Update metadataLastUpdated date per review comment"
git push -u origin review-recon-1.0.0
- Create merge request and merge to main in GitLab
- Rebase changes into your release candidate branch
BASH
git switch main
git pull --ff-only origin main
git switch 1.0.0
git rebase main
git push origin 1.0.0
- Copy commit SHA (Note: you should always copy the commit SHA after rebasing your release candidate branch)
- Reply to the comment in the review issue with the commit SHA for the
commit in which the comment was addressed
Key Points
- An administrative review is required for all open-source software projects
- A technical code review and a scientific domain review are required for official USGS software products
- There are many ways to conduct and document a software review. One way is by creating a GitLab issue with comments documenting the review
Content from Publishing
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I get my Git repository published once it is ready and has been approved?
- What static objects should I create in Git for the final release?
- How do I make the DOI point to the correct Git object?
Objectives
- Create a ticket to request publication of your software.
- Create a tag and a “release” in your Git repo.
- Publish a DOI.
Publishing Software Overview
Congratulations! You have added all the required files, completed the software reviews, and are ready to publish the software in your Git repository!
Checklist
As a reminder, at this point you should have an appropriate version of the following files:
- LICENSE.md
- DISCLAIMER.md
- README.md
- code.json
- optional: CITATION.md
And completed the following tasks:
- Created a DOI
- Completed the review process
- Obtained IPDS approval
- Started a release candidate branch
Initiate a Request to Publish your Code
After double checking the above list and reviewing the software release checklist, navigate to the USGS GitLab Software Management repository using a link provided by your instructor. To initiate a request, open an issue on this project and in the fields seen below, add a descriptive title and select the “GitLab Official Release” template under the “Description” field:

Selecting “GitLab Official Release” will pre-populate the text box with a template that includes sections for you to fill out. If you have followed along so far, you should have all the information requested. Update the template text with information relevant to your request:
For fields requesting textual input, examples are provided between
backticks `e.g. example`
; replace the content between the
backticks with your answer.
Fields with a checkbox (a space between two square brackets) are
asking you to acknowledge or agree to associated text; replace the space
between the brackets with an x
to indicate you
agree/acknowledge.
Do not edit the /label
lines as they may delay
notification/processing of your request.
When done, click “Create Issue” at the bottom.
Callout
The template may ask for the username of the approving official. If
they have a GitLab username, you can tag them using the @
symbol: @vdracula
. If they do not, you can write their
email address instead: vdracula@usgs.gov
.
Once an administrator sees the issue, they run an automated final validation tool to provide feedback on errors. That is why you will see this note at the bottom of their message:

Often, the feedback concerns errors from the code.json
file not containing the correct urls. Reviewing the Creating Metadata lesson may help
clarify how to fix these errors.
Discussion
As part of this course, you will not be submitting your Vampire and Wolfman project for publication, but you can take a look at current Git projects that have been submitted.
Navigate to the Software Management Issues page using the link provided by your instructor. Do you see any requests for publishing software? Click on a few and see how they filled out the template. What responses did they receive? What did they need to edit before publication? Do the requested edits align with what you’ve learned so far? How would you fix the errors?
Create a Git Tag
Once you have corrected all errors and received approval via the Git Issue, your next step is to create a static Git tag and delete the release candidate branch. A tag is a human readable name that points to a specific commit ID and does not change with subsequent updates or commits. Because of this stability, it is used for the official version of the software.
To create a tag, navigate to the left-hand menu and select “Tags” under “Code”, then click “New Tag”:

Then, fill out the tag information with the tag name as the version name, select the release candidate branch (which should have the same name), and write a brief description:

Create a Release from the Tag
On the next page, create a release from the tag. This release will be used to activate the DOI, i.e., the DOI will point to the release (not the tag, the release candidate branch, or the main branch).

Add a title for the release, which can be the same as the tag and version number. There is a Description box for any notes you may want to add, which you can edit at any point. For example, if you publish an updated version of the repository, you may want to come back, and redirect users to the most up-to-date version.

Once done, click “Create Release”. Then use the url to activate the
DOI. The url should be in this format:
https://code.usgs.gov/GROUP_HIERARCHY/REPOSITORY_NAME/-/releases/RELEASE-NAME
Now that you have the static tag and release, delete the release candidate branch by navigating to “Branches” under “Code” on the left-hand menu, then click the three vertical dots on the release candidate branch, and click “Delete branch”:

Why do we create a Release?
Why is it preferable to point the DOI to a release rather than a tag or branch? You cannot edit a tag or the release branch, but you can add notes or updates to a release. These notes may be useful to the user in the case that there is a more updated version or other information you wish to share.
Create a Git tag and associated release
Follow the above instructions to create a tag and release in your Vampires and Werewolves repository.
Activate your DOI
Use the USGS Asset Identifier Service to manage the DOI you created in the citation lesson. Before you can activate the DOI, you will need to include the creators, publication year, URL to the Release page, IPDS number, and related publication (if applicable). If you would like your scientific software information product to be displayed on the USGS website, you will also need to include a brief description of the product in the DOI. Once you have filled in this required information, the “Publish Approved Release to DataCite” button on the left-hand menu will become active and you will be able to click on it:

Note: As part of this course we are not creating and publishing DOIs
Disseminate in IPDS
Your last step, as with any USGS product, is to Disseminate the record in IPDS. Follow your Center’s policy on how to disseminate the product.
Key Points
- Submit an new issue as the first step in publishing a software product
- Create an static Git tag and associated release
- Activate your DOI using the url of the Git release
Content from Continuing Your Project
Last updated on 2025-05-22 | Edit this page
Overview
Questions
- How do I follow policy when developing an open-source project?
- When should I release updated versions of my project?
- How do I prepare my project for subsequent releases?
Objectives
- Continue development on your open-source project.
- Release subsequent versions of your open-source project.
In many cases, work must continue on a project after it becomes publicly accessible. This may be following an official USGS Software Information Product release, or following a more informal open-source release process. In any case, the USGS supports open-source project development with some conditions.
Continue Open-Source Project Development
When developing an open-source project, all modifications must receive, at minimum, one administrative security review before being incorporated into the open-source project. This review must ensure no sensitive or personally identifiable information is exposed by incorporating these changes.
There are workflows supporting this review process. Previously in this course, we introduced a branching workflow, which must be modified in order to align with policy during open-source development. One modified workflow that aligns with policy requirements is called a “Forking Workflow”.
Forking Workflow
With a forking workflow, each developer on the project creates a private personal copy, or fork, of the shared open-source (public) project. This fork is often referred to as the developer’s “origin” and the shared open-source project is often referred to as the “upstream”.
A forking workflow is also beneficial because it removes barriers to new collaborator contributions. Rather than needing to individually grant access to each potential collaborator, anyone can fork the open-source project and submit a merge request to contribute.
What is in a name?
The terms “origin” and “upstream” are conventions within the broader software development community for referencing the remote repository locations. These could be called anything, but following the convention improves shared understanding across development teams.
To view all your remote locations and their aliases using the command line, try
The forking workflow is similar to the branching workflow except the branches are created within the developer’s origin and the merge requests are from the developer’s origin to the shared upstream repositories. Let us see how this works.

In the diagram above we see an upstream and origin location within the USGS GitLab platform. Within the developer’s local workstation we see a local clone where the developer will work. A high-level overview of the workflow is as follows:
- Developer creates a personal fork called an origin
- Developer configures their fork to their local workstation
- Developer continues project development on branches within the local clone
- Developer pushes completed branches from their local clone to their origin
- Developer submits a merge request from the branch in their origin to the default branch in the upstream. A maintainer reviews and optionally merges the changes.
1. Create a fork
Creating a developer fork is a one-time process for each developer. The developer will fork the upstream repository to create their origin repository. This is completed within the GitLab interface by navigating to the upstream location and clicking the “Fork” button in the upper right area of the page.

It is important to click the “Fork” text and not the number to the right of the “Fork” text as these have different effects. On the next screen the developer must provide some information about their fork and then click the “Fork project” button near the bottom.

Primarily, the developer must “Select a namespace” where the fork will be created. Typically they would select their personal user namespace. It is uncommon to change the project name, project slug, or project description. Typically all branches should be included in the fork and the visibility can be either “Private” or “Internal”; however, “Public” will be disabled.
Visibility Matters
Personal forks are not allowed to be made publicly accessible. Only the shared upstream project location may be publicly accessible. However, when the fork has a more restrictive visibility than the upstream, GitLab often makes incorrect default assumptions when the developer subsequently creates merge requests. GitLab will assume the merge request is from the developer fork and to the developer fork, which is incorrect. For this reason, it is important to pay attention when creating the merge request later.
2. Configure local clone
The local clone may be configured in one of two different ways. If the developer had previously cloned the repository from what is now called the upstream, we can rename the existing remote to be called “upstream” and then add a new remote called “origin”. Alternatively, if the developer does not yet have a local clone of the project, they can clone their origin and add an “upstream”. The end result is the same.
The ORIGIN_URL
and UPSTREAM_URL
values may
be copied from the GitLab web interface by navigating to the
corresponding project page, selecting the “Code” drop down option and
then clicking the copy icon for the “Clone with HTTPS” option.

3. Continue project development
Within your local clone and personal origin, development continues following the branching workflow as described in the previous “Branching and Merging” episode. The developer creates different branches for each logical group of changes and commits them locally.
4. Push completed branches
When local development work is ready for integration, the developer
pushes their local branch to their developer origin. If the developer
previously pushed with the -u
or
--set-upstream-to
flags as described in the “Branching and Merging” episode, it is
important to reset these now since the “origin” is pointing to a new
location. More simply, you may always explicitly specify what is pushed
to where using:
Callout
In the above command, 1-my-first-issue
is the name of
the branch that is pushed and origin
is the remote
destination to where that branch is pushed.
5. Integrate changes
The developer should open a merge request from the
development branch in their origin repository to the upstream default
branch (e.g., main
). To do this, first navigate a web
browser to the developer origin project page on USGS GitLab. Then,
select “Code” and “Merge requests” from the navigation menu on the left.
Next click the “New merge request” button.

On the next screen, select the correct “Source branch” and “Target branch” information and then click “Compare branches and continue”.

In the “Source branch”, the developer fork location should be selected in the first drop down box. This should be the default if opening a merge request from the developer fork project page. The second drop down box in this section does not default to anything and the desired development branch should be selected.
In the target branch, it is important the correct upstream location is selected. In the screenshot, the “mlangseth” location is selected as the upstream. The default branch in the selected target location will be selected by default, this is typically correct but may be different for specific development teams.
Visibility (still) matters
If the visibility of the origin and upstream match, GitLab will select the correct values for the source and target repository locations. In general, this will not be the case following this open-source continuing development guide. It is for this reason you must carefully select the correct repository locations when on this screen.
On the final screen, you are given the option to provide a custom merge request title, description, labels, assignments, etc. Complete these choices appropriately and click the “Create merge request” button at the bottom to create the final merge request.
This new merge request can now be reviewed, commented on, reconciled, and integrated in the same manner as was described in the previous “Branching and Merging” episode.
Subsequent Releases
Following some amount of development on the open source project, it may become appropriate and/or necessary to release a new version of the software project as a new official USGS software information product. The new version of the project is subject to the same review and approval requirements as if it were the first or only release of the project. A new Information Product Data System (IPDS) record, a new digital object identifier (DOI), and updated metadata (code.json), are all required.
Triggering a subsequent release
When may a subsequent version of the software project be released as a new official USGS software information product?
When must a subsequent version of the software project be released as a new official USGS software information product?
In general, the triggering criteria for a subsequent release of a software project as an official USGS software information product are the same as for the original release of the software project.
A subsequent version of the software development project may be released as a new official USGS software information product at the author’s discretion.
A subsequent version of the software development project must be released as a new official USGS software information product if this new version is desired to be cited and/or results thereof are intended to be used to support some other official USGS information product.
Preparing Metadata
For releasing subsequent software information products, modify the
code.json file in the main
branch. Update the status field
for the previous version to Archival, if applicable. Multiple versions
may be in Production at once.
Copy the text from the previously released object in the code.json
and paste it between the main branch object and the previously released
object (still within the array []
). Add a comma after the
closing bracket (}
) for the object to separate it from the
previous product.
Update the version
, status
,
permissions.license.URL
, downloadURL
,
disclaimerURL
, and laborHours
in this object
to document the newest version. Additionally, update the
metadataLastUpdated
for any metadata objects that have been
modified, including the metadata object for this newest version.
Remember from the Creating Metadata episode that the top-level element in a code.json file is an array. If a project has been under development for a long time, there may be multiple released versions. In this case, objects should be ordered with the DEFAULT_BRANCH (e.g., main) appearing first, followed by the most recently released version, and so-on in reverse chronological order. For example:
JSON
[
{
// ... main (DEFAULT_BRANCH), status Development
},
{
// ... release 3.0.0, status Production
},
{
// ... release 2.0.0, status Archival
},
{
// ... release 1.0.0, status Archival
}
]
In the hypothetical example code.json file above, the release tag for
version 1.0.0 would only include metadata for that product (in addition
to the DEFAULT_BRANCH metadata) and it would likely have a status of
Production. Once you release version 2.0.0, three objects would exist in
the array, first would be the DEFAULT_BRANCH metadata with a status of
Development, next 2.0.0 with status Production and third would appear
1.0.0 with status Archival. However, because we never go back and edit
released tags, you would not change the code.json file in the 1.0.0
tagged version, and it would still specify that version as Production.
However, in the main
branch, the code.json file must be
updated to include new software information products. The code.json file
may include metadata objects marking other milestone tagged versions in
addition to those associated with official USGS software information
products.
Update the code.json File for Subsequent Release
Update the code.json file within the main branch to prepare to release version 2.0.0. What fields did you need to update? How many objects are now in your JSON array? Did you need to change anything in the version 1.0.0 object? What about the main object?
JSON
[
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "main",
"status": "Development",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/main/vampires-and-werewolves-main.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/main/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 0,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-06-15"
}
},
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "2.0.0",
"status": "Production",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/2.0.0/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/2.0.0/vampires-and-werewolves-main.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/2.0.0/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 300,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-07-01"
}
},
{
"name": "vampires-and-werewolves",
"organization": "U.S. Geological Survey",
"description": "Code for modeling the co-occurrence of Vampires and Werewolves on Mars",
"version": "1.0.0",
"status": "Archival",
"permissions": {
"usageType": "openSource",
"licenses": [
{
"name": "Public Domain, CC0-1.0",
"URL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/LICENSE.md"
}
]
},
"homepageURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves",
"downloadURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/archive/1.0.0/vampires-and-werewolves-main.zip",
"disclaimerURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves/-/raw/1.0.0/DISCLAIMER.md",
"repositoryURL": "https://code.usgs.gov/vdracula/vampires-and-werewolves.git",
"vcs": "git",
"laborHours": 200,
"tags": [
"usg-artificial-intelligence",
"vampires",
"werewolves",
"mars"
],
"languages": [
"Python"
],
"contact": {
"name": "Vlad Dracula",
"email": "vdracula@usgs.gov"
},
"date": {
"metadataLastUpdated": "2024-07-01"
}
}
]
The 2.0.0
object was added between the main
and 1.0.0
release objects. The following fields were
updated for the 2.0.0
object: version
,
status
, permissions.license.URL
,
downloadURL
, disclaimerURL
,
metadataLastUpdated
, and laborHours
. There are
now 3 objects in the code.json array. The status
and the
metadataLastUpdated
fields were updated in the
1.0.0
object. Nothing was updated in the main
object.
Key Points
- A good workflow can streamline open-source project development while ensuring compliance with governing policies
- While specific criteria necessitate releasing subsequent versions, this may also be done at the author’s discretion
- Subsequent versions are released in a manner very similar to the initial version
- The code.json file should be updated to include another object within the array that describes the new version.