Publishing data and code

How to publish and archive research software or source code

  • Choose open license The code is not really “open” without an open license. Common ones include MIT, FreeBSD and GPL3. The MIT license is often recommended due to its simplicity. See Quick guide to software licensing for a scientist programmer. See also the brief memo on open licensing of scientific material.

  • Add version number

  • Document the experiments. At the minimum, list where the data is located (preferably link directly in the code) and which scripts should be run to replicate the experiments. Sometimes people use electronic notebooks (Rmd / Jupyter) to share analyses. This can combine the code with automatically generated figures etc and directly show the outputs of the code. Add a README.

  • Long-term preservation Github is not guaranteed to continue and it is maintained by a private party. It is recommended to make a git “release” and store that in Zenodo with a DOI. Then the DOI provides permanent access to the exact code that was used in the publication, and that zip file will remain available in the academic Zenodo service. If you want you can also create a DOI for the repository. Good to notice that the repository and exact version at a given time may be different things. You can check this and this2 for details.

Tips on data science project git repo organization