Summary and Schedule
This lesson will help USGS researchers and software developers learn how to do the following:
- Create a Git repository for tracking version-controlled software.
- Collaborate with others using USGS GitLab.
- Create a USGS-compliant scientific software information product with source code, metadata, disclaimers, license, and citation.
- Publish a scientific software information product.
Lesson Narrative
Dracula is a researcher at the U.S. Geological Survey. He is working with Wolfman, a researcher at Euphoric State University on a project to model the co-occurrences of vampires and werewolves on Mars. They want to be able to work on the code at the same time, but they have run into problems doing this in the past. If they take turns, each one will spend a lot of time waiting for the other to finish, but if they work on their own copies and email changes back and forth things will be lost, overwritten, or duplicated.
A colleague suggests using version control to manage their work. Version control is better than mailing files back and forth:
Nothing that is committed to version control is ever lost, unless you work really, really hard at losing it. Since all old versions of files are saved, it is always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.
As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor.
When several people collaborate in the same project, it is possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there is a conflict between one person’s work and another’s.
Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded).
Version control is the lab notebook of the digital world: it is what professionals use to keep track of what they have done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it is not just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
Prerequisites
In this lesson we use Git from the Unix Shell. Some previous experience with the shell is expected, but is not mandatory.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Automated Version Control | What is version control and why should I use it? |
Duration: 00h 05m | 2. Setting Up Git | How do I get set up to use Git? |
Duration: 00h 10m | 3. Creating a Repository | Where does Git store information? |
Duration: 00h 20m | 4. Tracking Changes |
How do I record changes in Git? How do I check the status of my version control repository? How do I record notes about what changes I made and why? |
Duration: 00h 40m | 5. Exploring History |
How can I identify old versions of files? How do I review my changes? How can I recover old versions of files? |
Duration: 01h 05m | 6. Ignoring Things | How can I tell Git to ignore files I don’t want to track? |
Duration: 01h 10m | 7. Remotes in GitLab |
How do I safely back up my work to a remote site? How do I share my changes with others on the web? |
Duration: 01h 55m | 8. Branching and Merging |
What are branches in Git and why should I use them? How do I merge a branch back into my main branch?
|
Duration: 02h 20m | 9. Collaborating | How can I use version control to collaborate with other people? |
Duration: 03h 05m | 10. Conflicts | What do I do when my changes conflict with someone else’s? |
Duration: 03h 20m | 11. Open Science |
What is open science? How is open science valuable? How can version control help me make my work more open? |
Duration: 03h 36m | 12. Policy |
What is an official USGS software information product? When am I required to release my software as an official USGS software information product? When may I release my software as an official USGS software information product? |
Duration: 04h 00m | 13. Licensing | What licensing information should I include with my work? |
Duration: 04h 10m | 14. Citation |
How do I create a digital object identifier (DOI)? How can I make my work easy to cite? |
Duration: 04h 15m | 15. Commonly Included Files | What are some files which are usually included in USGS software projects, and what should their content be? |
Duration: 04h 25m | 16. Creating Metadata |
What is a code.json file?How do you create a code.json file?What are the required fields in the code.json file for a USGS software project?
|
Duration: 05h 25m | 17. Software Review for Authors |
How do I prepare my code for a software review? What information do I need to provide to my reviewer(s)? How should I reconcile reviewer comments? How can I document the review to meet Fundamental Science Practices requirements? |
Duration: 05h 55m | 18. Software Review for Reviewers |
What are my responsibilities as a reviewer of software? How do I conduct a software review? |
Duration: 06h 25m | 19. Publishing |
How do I get my Git repository published once it is ready and has been
approved? What static objects should I create in Git for the final release? How do I make the DOI point to the correct Git object? |
Duration: 06h 55m | 20. Continuing Your Project |
How do I follow policy when developing an open-source project? When should I release updated versions of my project? How do I prepare my project for subsequent releases? ::::::::::::::: |
Duration: 07h 40m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Installing Git
Details
This lesson will use Git and Bash.
Git is a version control system that allows you track who made changes to what, and when those changes were made. Git also has options for easily updating a shared or public version of your code on GitLab.
Bash is a commonly-used shell that gives you the power to do tasks more quickly.
Follow the instructions below to ensure you have the proper software installed on your computer.
GitLab Account
By default, all USGS personnel with Active Directory credentials will have a GitLab account automatically provisioned for them upon their first access of https://code.usgs.gov. Go to https://code.usgs.gov, click the “Sign In” button in the top-right, and use the “USGS Login” button.
Preparing Your Working Directory
We will do our work in the Desktop
folder so make sure
you change your working directory to it with:
If your Desktop
is backed up by OneDrive, change your
working directory to it with:
Note: You can start typing OneDrive
and then hit
Tab
to autocomplete through “DOI/”. Then, starting typing
Desktop
and hit Tab
to autocomplete.