Reproducibility 3a Minutes
introduction of everyone
Chair: Andrew Nelson (AN) Scribe: Andrew McCluskey (ARM)
Members: Thomas Saerbeck (TS) Gemma Guest (GG) Gaetano Mangiapia (GM) Jean Francois Moulin (JFM) Alex Koutsioumpas (AK)
AN is outlining direction
- way forward that is implementable
- how to evangelise?
- how to help users?
- what do we expect wrt peer review?
In previous meeting there was a discussion of reproducible samples before going onto reproducible of a journal article.
Reproducibility of analysis is becoming a requirement for publication. FAIR data etc.
What is necessary for “reproducible” analysis:
- raw data?
- history of reduction?
- reduced data?
- analysis mechanism (script/notebook)?
- documentation of analysis?
- how is resolution smearing implemented?
- better reduction algorithm documentation
AN has presented the bioSAXS publication guidelines in Acta Cryst D.
- more important for biological scattering
- but does have requirements for data analysis (including data should be submitted)
What are the steps for our community:
- our users might not be as computationally literate as we would like
- list is good (but exhaustive)
- touches on “proper data analysis”
- we should agree to start on a list of items that a user/author should include to “reproduce” the data
- from sld to reflectivity, including formalism
- institute should provide reduction information
- user/author should provide analytical information
- analysis with a notebook/script computers are good at reproduction
- but this is more effort
- should exact steps be the BASE?
- not EXACT steps but referenced
- methods include instrument/reduction/analysis package
- the references give this information
- you need exact parameters to reproduce analysis
- we should give guideline (document) for the information the users will include in paper
- version of software should be included
- mantid versions algorithms and software
- this is stored in the nexus analysis
- difficult to do in a single project
- saving to ASCII obviously does not do this, becomes instrument scientists problem
- journals are moving to force versioning from users and better documentation
- a standard list of parameters for users would be useful for publication and dealing with this from journals.
- are reduced data files a minimum?
- what does that help?
- if I don’t trust the analysis, I can redo it myself
- test their conclusions
- your don’t need the reduced data
- the referee should be checking this
- how do you interpreate data?
- is chasing up false interpretations is a dodgy aspect
- the interpretation is the SLD profile
- fit doesn’t match data OR SLD doesn’t match interpretation
- in the bioSAXS community
- users need to upload data to publish
- to upload data some requirements are necessary
- we can create a repository to store data
- we need a format for reporting a result SLD?
- a pdb for reflectivity
- For analysis to be reproducible you need reduced data
- reduced data is pretty small
- this is easy and cheap (we should do it!)
- what do we mean by including data?
- “the data is available @ x”
- data is on a institutional repo
- every facility has storage to x years
- when you publish data for beamline, the institute should provide data access
- but we need reduced data, cause information from raw to reduced is very hard
- file format could have reduction info
- file formats discussed data policy and europe is consistent
- not sure about US
- include raw, meta and reduced data
- handled carefully
- restricted access
- do we want to rewrite data policy
- not just institute data also journal policy
- what is the lowest bar we would accept for reproducibility?
- mechanistic analysis?
- sld graph is not easily digitised
- a table is better at least in SI
- data should be shown with a fit
- general description should include assumptions and formalisms
- not dissimilar to bioSAXS
- is script too high a bar?
- nice option but software dependent?
- if the best practice is possible they should do it
- important to have concrete steps
- start working on a set of principles/document which outlines this
- in future this can become a paper
- needs the whole community
- we need better software tools, capture the information in a gui
- the gui should be the front end to a script
- mantid is a bit hard to descipher, documentation is vital
- start a MD file to work towards a “paper”
- encourage all to contribute
- how else can we move this forward?
- document on github is one thing
- teaching starts with users
- there are implementations where mantid produces a gui script
- this is given to the users
- the analysis software does not strictly do that
- lets tell the analysis code people to do this
- it is possible to do, refnx has this
- refl1d possibly does something similar
- we need to educate software engineers too
- is there propriatory software that can mess with things
- if it is not open it is not worthwhile
- rascal uses MATLAB which is questionable but passable
- I use closed code
- include calculations in the paper