This month we have a bumper collection of top tips from our Research Software Engineers (RSEs)! They are passing on a selection of good practice tips which they advise all researchers who develop or write software, to consider adopting.
When an RSE from Research IT joins a research group to help with software development there are several aspects that we often see that are different to the way we work. Here are a few of them, in no particular order – with the possible exception of the importance of number 1. Our hope is that you will investigate these tips even if you are not currently working with us!
1. Lack of proper version control of source code and data
Version control is key to managing and having some control over software development. Many researchers think that they only need it when they are collaborating with other teams, but it is just as valid and important for a single researcher/developer. Currently, the world has adopted Git as a de facto standard. It’s not the easiest system in the world to use, but the effort to learn and use it will pay off many times over. Research IT also run training courses in its use.
2. Set(s) of test data with known good results
This is vital for testing for regressions whenever source code is modified.
3. Code and scripts should take appropriate parameters as either arguments or from a config file (or both)
Languages like R and MATLAB make it easy to modify source and re-execute/interpret. However, this will prevent the code being fully automated, perhaps in a workflow with other components. Seek help from us if you don’t know how to do this in your programming language of choice. If you do use a config file, watch out for leaving passwords and similar in there, especially if you are using an external repository (e.g. GitHub).
4. Lack of a README file
Related to 1 above, in the version control repository there should be a README file that explains how to install the code and run a test case. It is a truism that code that is hard to install – perhaps because of its many dependencies – is code that is not used as much as it might be. Tools like “Make” can be used to make re-build, installation and test more automatic, and hence simpler and more reliable.
5. Lack of some kind of Requirements Specification for each of the components in your system
That is: what (not how) must the software accomplish to be a valid, acceptable product or research output? Otherwise, how else do you, and later your user base, know what you have built – or are building – is relevant and wanted?
6. Lack of error/exception checking and handling
When just developing research software it is easy to think nothing will go wrong. After all, you have control over where the input test data come from. In reality every sub-component or unit may have possible inputs outside of its domain, and/or will rely on invoking sub units (possibly from libraries) that may generate errors. Any such error needs to be reported to the user in language they understand rather than the language of the computer (e.g. not “segment fault” or “access failure”).
7. Comments in the source code
Contrary to what you sometimes hear, good source code is not “self-documenting”. A minimum is a comment at the head of every unit (function or similar) explaining what it is designed to achieve and how it does such. (Simple example: sorts the input array using the library mergesort function.)
Congratulations if you already do much of the above! If you have any questions about the tips above please comment below or get in touch with us.