Instructor Notes

Using a software tool to handle the versions of your project files lets you focus on the more interesting/innovative aspects of your project.

  • Version control’s advantages
    • It is easy to set up
    • Every copy of a Git repository is a full backup of a project and its history
    • A few easy-to-remember commands are all you need for most day-to-day version control tasks
    • The GitLab hosting service provides a web-based collaboration service
  • Two main concepts
    • commit: a recorded set of changes in your project’s files
    • repository: the history of all your project’s commits
  • Why use GitLab?
    • Approved for USGS personnel to use
    • We have the ability to add non-USGS collaborators to projects to work together
    • We have currently established practices for releasing approved software through USGS GitLab

Overall


Version control might be the most important topic we teach, but Git is definitely the most complicated tool. However, GitLab/GitHub presently dominate the open software repository landscape, so the time and effort required to teach fundamental Git is justified and worthwhile.

Because of this complexity, in the original Carpentries version of this course, they did not teach novice learners about many interesting topics, such as branching, hashes, and commit objects. The USGS project team that developed this USGS-specific resource decided that some of these topics, such as branching, are actually critical to successfully develop open-source software. Teaching more complex topics does make this course longer and a bit more complicated, but we believe that these skills are essential, especially for people who ultimately need to release their code to the public.

It is still important, though, to convince learners that version control is useful for researchers, working in teams or not, because it is:

  • a better way to “undo” changes,
  • a better way to collaborate than mailing files back and forth, and
  • a better way to share code and other scientific work with the world.

Teaching Notes


  • You can “split” your shell so that recent commands remain in view using this script script.

  • Make sure the network is working before starting this lesson.

  • Drawings are particularly useful in this lesson: if you have a whiteboard, use it!

  • If some learners are using Windows, there will inevitably be issues merging files with different line endings. (Even if everyone’s on some flavor of Unix, different editors may or may not add a newline to the last line of a file.) Take a moment to explain these issues, since learners will almost certainly trip over them again. If learners are running into line ending problems, GitHub has a page that helps with troubleshooting. Specifically, the section on refreshing a repository may be helpful if learners need to change the core.autocrlf setting after already having made one or more commits.

  • We do not use a Git GUI in these notes because we have not found one that installs easily and runs reliably on the three major operating systems, and because we want learners to understand what commands are being run. That said, instructors should demo a GUI on their desktop at some point during this lesson and point learners at this page.

  • Instructors should show learners graphical diff/merge tools like DiffMerge.

  • When appropriate, explain that we teach Git rather than CVS, Subversion, or Mercurial primarily because of GitLab/GitHub’s growing popularity: CVS and Subversion are now seen as legacy systems, and Mercurial is not nearly as widely used in the sciences right now.

  • Further resources:

Automated Version Control


  • Ask, “Who uses ‘undo’ in their editor?” All say “Me”. ‘Undo’ is the simplest form of version control.

  • Give learners a five-minute overview of what version control does for them before diving into the watch-and-do practicals. Most of them will have tried to co-author papers by emailing files back and forth, or will have biked into the office only to realize that the USB key with last night’s work is still on the kitchen table. Instructors can also make jokes about directories with names like “final version”, “final version revised”, “final version with reviewer three’s corrections”, “really final version”, and, “come on this really has to be the last version” to motivate version control as a better way to collaborate and as a better way to back work up.

Setting Up Git


  • We suggest instructors and students use nano as the text editor for this lessons because

    • it runs in all three major operating systems,
    • it runs inside the shell (switching windows can be confusing to students), and
    • it has shortcut help at the bottom of the window.

    Please point out to students during setup that they can and should use another text editor if they are already familiar with it.

  • When setting up Git, be very clear what learners have to enter: it is common for them to edit the instructor’s details (e.g. email). Check at the end using git config --list.

  • When setting up the default branch name, if learners have a Git version older than 2.28, the default branch name can be changed for the lesson using git branch -M main if there are currently commits in the repository, or git checkout -b main if there are no commits/the repository is completely empty.

Creating a Repository


  • When you do git status, Mac users may see a .DS_Store file showing as untracked. This a file that Mac OS creates in each directory.

  • The challenge “Places to create Git repositories” tries to reinforce the idea that the .git folder contains the whole Git repo and deleting this folder undoes a git init. It also gives the learner the way to fix the common mistake of putting unwanted folders (like Desktop) under version control.

    Instead of removing the .git folder directly, you can choose to move it first to a safer directory and remove it from there:

    BASH

    $ mv .git temp_git
    $ rm -rf  temp_git

    The challenge suggests that it is a bad idea to create a Git repo inside another repo. For more discussion on this topic, please see this issue.

Tracking Changes


  • It is important that learners do a full commit cycle by themselves (make changes, git diff, git add, and git commit). The “bio repository” challenge does that.

  • This is a good moment to show a diff with a graphical diff tool. If you skip it because you are short on time, show it once in GitLab.

  • One thing that may cause confusion is recovering old versions. If, instead of doing $ git checkout f22b25e mars.txt, someone does $ git checkout f22b25e, they wind up in the “detached HEAD” state and confusion abounds. It is then possible to keep on committing, but things like git push origin main a bit later will not give easily comprehensible results. It also makes it look like commits can be lost. To “re-attach” HEAD, use git checkout main.

  • This is a good moment to show a log within a Git GUI. If you skip it because you are short on time, show it once in GitLab.

Ignoring Things


Just remember that you can use wildcards and regular expressions to ignore a particular set of files in .gitignore.

Remotes in GitLab


  • Make it clear that Git and GitLab are not the same thing: Git is an open-source version control tool, GitLab is a company that hosts Git repositories in the web and provides a web interface to interact with repos they host.

  • It is very useful to draw a diagram showing the different repositories involved.

  • When pushing to a remote, the output from Git can vary slightly depending on what learners execute. The lesson displays the output from git if a learner executes git push origin main. However, some learners might use syntax suggested by GitLab for pushing to a remote with an existing repository, which is git push -u origin main. Learners using syntax from GitLab, git push -u origin main, will have slightly different output, including the line Branch main set up to track remote branch main from origin by rebasing.

Branching and Merging


  • This is a new episode added to introduce the slightly complex topic of branching and merging.

  • It may be worth going over the difference between git switch and git checkout since some people may only be familiar with checkout.

  • Emphasize the importance of not including sensitive information in code.

Collaborating


  • Decide in advance whether all the learners will work in one shared repository, or whether they will work in pairs (or other small groups) in separate repositories. The former is easier to set up; the latter runs more smoothly.

  • Role playing between two instructors can be effective when teaching the collaboration and conflict sections of the lesson. One instructor can play the role of the repository owner, while the second instructor can play the role of the collaborator. If it is possible, try to use two projectors so that the computer screens of both instructors can be seen. This makes for a very clear illustration to the students as to who does what.

  • It is also effective to pair up students during this lesson and assign one member of the pair to take the role of the owner and the other the role of the collaborator. In this setup, challenges can include asking the collaborator to make a change, commit it, and push the change to the remote repository so that the owner can then retrieve it, and vice-versa. The role playing between the instructors can get a bit “dramatic” in the conflicts part of the lesson if the instructors want to inject some humor into the room.

  • If you do not have two projectors, have two instructors at the front of the room. Each instructor does their piece of the collaboration demonstration on their own computer and then passes the projector cord back and forth with the other instructor when it is time for them to do the other part of the collaborative workflow. It takes less than 10 seconds for each switchover, so it does not interrupt the flow of the lesson. And of course it helps to give each of the instructors a different-colored hat, or put different-colored sticky notes on their foreheads.

  • If you are the only instructor, the best way to create is clone the two repos in your Desktop, but under different names, e.g., pretend one is your computer at work:

    BASH

    $ git clone https://code.usgs.gov/vdracula/vampires-and-werewolves.git vampires-and-werewolves-at-work
  • It is very common that learners mistype the remote alias or the remote URL when adding a remote, so they cannot push. You can diagnose this with git remote -v and checking carefully for typos.

    • To fix a wrong alias, you can do git remote rename <old> <new>.
    • To fix a wrong URL, you can do git remote set-url <alias> <newurl>.
  • Before cloning the repo, be sure that nobody is inside another repo. The best way to achieve this is moving to the Desktop before cloning: cd && cd Desktop.

  • If both repos are in the Desktop, have them to clone their collaborator repo under a given directory using a second argument:

    BASH

    $ git clone https://code.usgs.gov/vdracula/vampires-and-werewolves.git vdracula-vampires-and-werewolves
  • The most common mistake is that learners push before pulling. If they pull afterward, they may get a conflict.

  • Conflicts, sometimes weird, will start to arise. Stay tight: conflicts are next.

  • Learners may have slightly different output from git push and git pull depending on the version of git, and if upstream (-u) is used.

Conflicts


  • Expect the learners to make mistakes. Expect yourself to make mistakes. This happens because it is late in the lesson and everyone is tired.

  • If you are the only instructor, the best way to create a conflict is:

    • Clone your repo in a different directory, pretending it is your computer at work: git clone https://code.usgs.gov/vdracula/vampires-and-werewolves.git vampires-and-werewolves-at-work.
    • At the office, you make a change, commit and push.
    • At your laptop repo, you (forget to pull and) make a change, commit and try to push.
    • git pull now and show the conflict.
  • Learners usually forget to git add the file after fixing the conflict and just (try to) commit. You can diagnose this with git status.

  • Remember that you can discard one of the two parents of the merge:

    • discard the remote file, git checkout --ours conflicted_file.txt
    • discard the local file, git checkout --theirs conflicted_file.txt

    You still have to git add and git commit after this. This is particularly useful when working with binary files.

  • Keep in mind that depending on the Git version used, the outputs for git push and git pull can vary slightly.

Open Science


  • Some people may think that because of our Fundamental Science Practices policies that USGS researchers can not develop code in the open. We get into the USGS policy behind open-source development in the next episode, so no need to dive too deep into that topic in this episode.

  • The challenge may bring up a discussion about whether it is okay to publish code within a data release. Try not to get sucked into this discussion. There are times when it may be appropriate to publish code within a data release, but there are many benefits to using Git for version control and publishing code in GitLab. Try to keep the discussion focused on those benefits.

Policy


  • Information conveyed in this episode reflects policy as it exists, or as it is anticipated to exist, at the time the lesson is initially made generally available within the USGS. A best effort will be made to keep this policy information up-to-date moving forward; however, current published policy will always supersede what is found in this episode. If anything seems to be out of date as you prepare to teach this episode, please make a note and submit an issue to the code repository.

  • When speaking, emphasize software proJECT versus software information proDUCT. It is important these two words (project and product) are not confused as interchangeable.

  • While we decided to attempt to keep things a bit simpler by only teaching a branching workflow in this course, it is important to introduce the concept of forking workflows, especially for people who are interested in open-source development.

Licensing


  • We teach about licensing because questions about who owns what, or can use what, arise naturally once we start talking about using public services like GitLab to store files. Also, the discussion gives learners a chance to catch their breath after what is often a frustrating couple of hours.

  • The topic of licenses within USGS has been a very difficult one to tackle. The DOI Solicitor’s office finally approved the use of CC0 1.0 for completely original software. That being said, as an instructor for this course, you cannot actually give legal advice on what licenses people should be using for their projects. If there are questions, those need to be directed to the DOI Solicitor’s Office.

Citation


  • The topic of software citation is a bit more complex than citing manuscripts and data, but we try to keep it relatively simple in this episode.

Commonly Included Files


  • Besides the code.json and the LICENSE.md file, the DISCLAIMER.md is the only other file required by Fundamental Science Practices; however, we like to encourage the use of the other files described in this episode.

Creating Metadata


  • The code.json is one of the more confusing aspects of releasing software at USGS. There is a lot packed into this episode, including an introduction to JSON and understanding the code.json fields. Make sure you give yourself plenty of time for this episode. We know this can be challenging since there has already been a lot of content delivered and this is near the end of the course.

  • We decided to use the GitLab editor instead of a Desktop editor, just to simplify things where we could.

  • This episode really pulls together a lot of the software release concepts discussed in previous episodes (e.g., licenses, disclaimers, branches).

Software Review for Authors


  • The next two episodes are a little different than the rest of the episodes in that we cover concepts (e.g., reconciling review) in this episode that may happen chronologically after other concepts that we teach later (e.g., reviewing a software product). We decided to instead address concepts based on audience to limit the number of disparate episodes that would be needed and also to give people a one-stop place to look for information about review depending on what role they have in the review process.

  • In the “Create a Review Issue” challenge, learners will use the review_template.md file to create their own issue. The goal in the second part of the challenge is to get them to think critically about the types of things that reviewers should be looking for when completing a review.

Software Review for Reviewers


  • This is the first and only episode from a different perspective than the author. We felt this was an important episode to include, though, since it may fall to the author to explain to their reviewers how to complete a review (not many people have conducted a software review). This episode will give authors a foundation in what to expect from a review as well as a resource to give to their reviewers.

Submitting a Software Product for Publication


  • This episode builds off of everything done up to this point, so if anything changes with required files or tasks, it would need to be updated here as well.

Automated Version Control


Setting Up Git


Creating a Repository


Tracking Changes


Exploring History


Ignoring Things


Remotes in GitLab


Branching and Merging


Instructor Note

If you have time, you can go into detail about other merge options like deleting the source branch or squashing commits.



Collaborating


Conflicts


Open Science


Policy


Evolving Policy

Information conveyed in this episode reflects policy as it exists, or as it is anticipated to exist, at the time the lesson is initially made generally available within the USGS.

A best effort will be made to keep this policy information up-to-date moving forward; however, current published policy will always supersede what is found in this episode.



Verbal Emphasis

When speaking, emphasize software proJECT versus software information proDUCT. It is important these two words (project and product) are not confused as interchangeable. Also, note that a software information product is sometimes called a USGS code release, but that is not its formal name.



Licensing


A note explaining the difference between public domain and CC0

Including a public domain dedication statement informs the worldwide public that:

  1. the work is a U.S. Government work and is in the U.S. public domain, and
  2. the USGS, which may own copyright in one or more countries outside the United States, wishes to place the work into the public domain worldwide.

Including the CC0 accomplishes item #2. It’s sort of like saying something in English (public domain) and then translating to other languages (CC0) so everyone understands.



Citation


Commonly Included Files


Creating Metadata


Instructor Note

When discussing the JSON Syntax, verbally note what data types need quotes and which do not.

Demonstrate the JSON Syntax in a text editor such as Visual Studio Code or RStudio. Point out that the editor recognizes the .json extension and provides syntax highlighting and indentation assistance.



Instructor Note

Mention that you are going to use the GitLab IDE to create and edit a code.json file. Learners may use whatever editor they are comfortable with; however, you will not provide assistance with custom editors.



Instructor Note

Check in with learners to make sure they were able to create the new file and add the template text.



Software Review for Authors


Screenshot may vary

The screenshot below will look different if someone has already created an issue before. It will likely be worth commenting on this so people are not confused.



Software Review for Reviewers


Optional Partner Work

The following activity could lend itself to working with a partner where one person acts as the reviewer on another person’s repository. If permissions were not already established earlier and time is limited, each participant can play the role of reviewer and author on their own repository.



Publishing


Links to Request Publication via a Git Issue

Publishing USGS GitLab repositories is managed via the USGS Cloud Hosting Solutions (CHS) Git repository on Software Management. Put the link to create an issue requesting publication of a scientific software information product in the chat and/or share your screen showing how to navigate to this site.



Instructor’s Note

You can show the AIS Beta interface at https://www1-beta.usgs.gov/identifiers/ if you want to demo how to use the DOI creation tool.



Continuing Your Project


Instructor Note

Note that the screenshot shows the interface when there are not currently any other open merge requests. When other open merge requests exist, the UI is slightly different and the button is in the upper right region of the screen.



Instructor Note

The option to “Squash” commits may be useful if the changes proposed in the merge request introduced any potentially sensitive or personally identifiable information. In this case, a commit may be added to remove this information and by “squashing” during the merge, the intermediate commit with the poentially sensitive information will be lost.

Squashing is a powerful tool, but should be used with caution to avoid future merge conflict scenarios.



Instructor Note

The following challenge is a throw-back to the policy episode and the corresponding solution links leaners back to the previous episode for reference.