Reproducibility 2

Chen Shen, Gaetano Mangiapia AndrewMcCluskey Francesco Carla Artur Glavic Gemma Guest

–> How can we show our users the importance for reproducibility –> Andrew talked through the agenda

Artur was surprised that this was a question. He thought surely all scientists would be convinced by this.

Andrew’s motivation for this arose during his PhD, during which he created reproducible research workflows.

AN gave an example of why reproducibility is important e.g. crystallographers have CIF files that other people can use to describe the structure of chemically/physically interesting samples.

Chen was asking where the interest in reproducibility occurs? On instrument, or data analysis? Different people have different responsibilities.

AN gave two examples for why reproducibility was important

  • wanted to use analysis code that was used in previous papers for new work (polymer brush work that’s been published over several works but code hasn’t been provided)
  • wanting to do a more advanced analysis than was provided in a published paper.

Gemma - is creating a reproducible workflow too hard? How do we make it easier? What tools?

AG - curated list of common issues that one has to look out for when making research reproducible.

Discussion on whether it was carrot or stick approach for reproducibility. Neither suggested Artur (not sure why).

AndrewN suggested small steps, slowly make it better, minimum is to submit reduced datasets with paper.

*ACTION? Guidance that we write for how to achieve reproducibility. Put on github repo? AG - list of examples where things went right or wrong. AN - concepts are important in such a paper AMC - back them up with concrete examples AN - *submit data with paper GG - common pitfalls

AN - students/users need to have a higher level of computer skills to do these kinds of reproducibility (e.g. snakemake/compile LaTeX/use refnx). AMC - skill level is increasing, future PhD students are developing those skills AG - perhaps you don’t need command line skills to achieve reproducibility? CS - can one achieve reproducibility with easy to use tools?

AMC - have an event similar to reprohack, where users could submit their work to a competition/workshop to see how reproducible it was.

AG - put examples in repro repo (doing that already)

**ACTION - find examples from the literature where the papers should be reproducible, then try to reproduce the analysis. How close do you come to what the users achieved/concluded. Put anonymised results in a paper.

CS - Data has to be reproducible. Instrument has to be reproducible, lot of steps in reduction that can make the output variable.

AG - reproducibility should encompass acquisition+reduction+analysis, but stop at the point of where one has to interpret an SLD profile.