Material for the Annual General Assembly 2021.

structure of the text data file

  • header with meta data
  • column description
  • data set
  • [sub-header]
  • [data set]

header with meta data

Wrapped YAML

  • each line of the header starts with # (hash and space)
  • it is structured due to YAML or JSON rules
  • all non-YAML entries start with an additional hash #

the 1st line

The first line contains information about

  • the general content;
  • the orso file format version (and level of strictness) used;
  • the encoding;
  • a link to us.
# # ORSO reflectivity data file | 0.1 standard | YAML encoding | https://www.reflectometry.org/

meta data

What follows are several blocks with meta data, where the dictionary is not yet defined. Thus here only the format, but not the content is mandatory.

The essential categories deal with

  • the origin of the file
  • the origin and meta information of the data
  • the processing steps performed so far

fuhrer categories might be added, e.g.

  • sample description (in- or output of analysis program)
  • simulation / fitting history
  • instrumental details for an improved resolution function

Here we concentrate on the essential ones:

origin / ownership of the file

Information about the one who created this file.

clear text:

created
by Jochen Stahn
from Paul Scherrer Institut
on 2020-04-06T13:21:18
on the computer lnsa17.psi.ch

suggestion for the implementation (= the dictionary)

# creator:
#     name        : Jochen Stahn
#     affiliation : Paul Scherrer Institut (PSI)
#     time        : 2020-04-06T13:21:18
#     computer    : lnsa17.psi.ch

or if automatically generated

# creator:
#     description : automated output for neutron reflectometer Amor at PSI
#     time        : 2020-04-06T13:21:18
#     computer    : lnsa17.psi.ch

further entries might be

#     contact     : e-mail / URL

origin and ownership of the data

Here goes all the information about the ownership of the raw data, and where and how they were obtained.

Made-up example:

# data source:
#     owner:
#         name: T. Proposer
#         affiliation: The Institute (TI)
#         contact: t_proposer@institute.org
#     experiment:
#         facility: Paul Scherrer Institut, SINQ
#         ID: 2020 0304
#         date: 2021-05-12 - 2021-05-15
#         title: Generation of input for formatting purposes
#         instrument: Amor
#         probe: neutrons
#     sample:
#         name: Ni1000
#         description:
#             - amb: air
#             - layer: {material: Ni, thickness: 100 nm}
#             - sub: Si
#     measurement:
#         scheme: energy-dispersive
#         instrument_settings:
#             polarisation: +
#             wavelength: 
#                  unit: angstrom
#                  min: 3.0
#                  max: 12.0
#                  resolution:
#                      type: proportional
#                      value: 0.022 # Delta lambda / lambda 
#             incident_angle:
#                  unit: deg
#                  value: 0.4
#                  resolution:
#                      type: constant
#                      unit: deg
#                      value: 0.01
#         data_files:
#             - file     : amor2020n001925.hdf
#               created  : 2020-02-03T14:27:45
#             - file     : amor2020n001926.hdf
#               created  : 2020-02-03T14:37:15

or for different angles:

#             incident_angle:
#                  unit: deg
#                  resolution:
#                      type: constant
#                      unit: deg
#                      value: 0.01
#         data_files:
#             - file     : amor2020n001925.hdf
#               created  : 2020-02-03T14:27:45
#               incident_angle:
#                   value: 0.4
#             - file     : amor2020n001926.hdf
#               created  : 2020-02-03T14:37:15
#               incident_angle:
#                   value: 1.5

history of processing

This section should provide the possibility to reproduce the data reduction process. I.e. to recreate this file (except for the creator part) from the raw data.

It also mentions the corrections performed so far. There might be a reference to a well-defined algorithm in the future.

# reduction:
#     software: eos.py
#     call : eos.py -Y 2020 -n 1925-1927 -y 9,55 ni1000 -O -0.2 -r 1064 -s 1 -i -a 0.005 -e
#     comment:
#         corrections performed by normalisation to measurement on reference sample
#     corrections:
#          - footprint
#          - incident intensity
#          - detector efficiency

column description

The last part of the header is the description of the columns to follow.

# columns:
#     - {name: Qz, unit: 1/angstrom, dimension: WW transfer}
#     - {name: R, dimension: reflectivity}
#     - {name: sR, dimension: error-reflectivity}
#     - {name: sQz, unit: 1/angstrom, dimension: resolution-WW transfer}

This is a compact notation for

# columns:
#     - name: Qz
#       unit: 1/angstrom
#       dimension: WW transfer
#     - name: R
#       dimension: reflectivity
#     - name: sR
#       dimension: error-reflectivity
#     - name: sQz
#       unit: 1/angstrom
#       dimension: resolution-WW transfer

In case there are multiple data sets (see below) it might be necessary to provide an identifier also for the first data set. Thus there can be the optional line

# data set: <identifier>

Optionally, the last line might be a short-notation description

# #         Qz             RQz              sR              sQ

where the second # means that this is a comment line not to be processed.

data set

Each data set consists of a rectangular array of numbers. All entries have the same data type (the default is float). There is no leading space.

1.03563296e-02  3.88100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.16430511e+01  8.89252719e+00  5.33586471e-05
...

Suggestion to define the size of each column according to one of the following rules (with increasing rigidity):

  • Each column can have its individual length, but the descriptive header must fit.
  • All columns must have the same length
  • All columns are 16 spaces wide, i.e. % 16f, % 16g or % 16e
  • All columns have the format % 16e

multiple data sets

In case there are several data sets in one file (e.g. for different spin states) the following rules apply:

empty line

This is optional. An empty line is recognized by gnuplot as a separator for 3 dimensional data sets.

separator

If more than one data set is provided, they are separated by a line starting with

# data_set: <identifier>

where the identifier is either an unique name or a number. The default numbering of data sets starts with 0, the first additional one thus gets number 1 and so on.

overwrite meta data

Below the separator line, meta data might be added. These overwrite the meta data supplied in the main header (i.e. data set 2 does not know anything about the changes made for data set 1).

# data_source:
#     measurement:
#         polarisation: -
#     input_files:
#         data_files:
#             - file     : amor2020n001930.hdf
#               created  : 2020-02-03T15:27:45
# reduction:
#     call : eos.py -Y 2020 -n 1930 -y 9,55 ni1000 -O -0.2 -r 1064 -s 1 -i -a 0.005 -e

repetition of short-version column description

optional

# #         Qz             RQz              sR              sQ

next data set

of the same format (number, format and description of columns) as data set 0

1.03563296e-02  1.08100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.06430511e+01  8.89252719e+00  5.33586471e-05
...

example

all of the above mentioned lines without comments.

text_example.ort

# # ORSO reflectivity data file | 0.1 standard | YAML encoding | https://www.reflectometry.org/
# creator:
#     name        : G. User
#     affiliation : PSI
#     time        : 2020-04-06T13:21:18
#     computer    : lnsa17.psi.ch
# data source:
#     owner:
#         name: T. Proposer
#         affiliation: The Institute (TI)
#         contact: t_proposer@institute.org
#     experiment:
#         facility: Paul Scherrer Institut, SINQ
#         ID: 2020 0304
#         date: 2021-05-12 - 2021-05-15
#         title: Generation of input for formatting purposes
#         instrument: Amor
#         probe: neutrons
#     sample:
#         name: Ni1000
#         description:
#             - amb: air
#             - layer: {material: Ni, thickness: 100 nm}
#             - sub: Si
#     measurement:
#         scheme: angle- and energy-dispersive
#         instrument_settings:
#             sample_rotation:
#                  alias: mu
#                  unit: deg
#                  value: 0.7
#             detector_rotation:
#                  alias: mu
#                  unit: deg
#                  value: 1.4
#             incident_angle:
#                  unit: deg
#                  min: 0.4
#                  max: 1.0
#                  resolution:
#                      type: constant
#                      unit: deg
#                      value: 0.01
#             wavelength:
#                  unit: angstrom
#                  min: 3.0
#                  max: 12.5
#                  resolution:
#                      type: proportional
#                      value: 0.022 # Delta lambda / lambda 
#             polarisation: +
#         data_files:
#             - file     : amor2020n001925.hdf
#               created  : 2020-02-03T14:27:45
#             - file     : amor2020n001926.hdf
#               created  : 2020-02-03T14:37:15
#             - file     : amor2020n001927.hdf
#               created  : 2020-02-03T14:27:02
#         references:
#             - file     : amor2020n001064.hdf
#               created  : 2020-02-02T15:38:17
# reduction:
#     software: eos.py
#     call : eos.py -Y 2020 -n 1925-1927 -y 9,55 ni1000 -O -0.2 -r 1064 -s 1 -i -a 0.005 -e
#     comment:
#         corrections performed by normalisation to measurement on reference sample
#     corrections:
#          - footprint
#          - incident intensity
#          - detector efficiency
# columns:
#     - {name: Qz, unit: 1/angstrom, dimension: WW transfer}
#     - {name: R, dimension: reflectivity}
#     - {name: sR, dimension: error-reflectivity}
#     - {name: sQz, unit: 1/angstrom, dimension: resolution-WW transfer}
# data set: spin_up
# #         Qz             RQz              sR              sQ
1.03563296e-02  3.88100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.16430511e+01  8.89252719e+00  5.33586471e-05
...

# data_set: spin_dn
# data_source:
#     measurement:
#         instrument_settings:
#             polarisation: -
#     input_files:
#         data_files:
#             - file     : amor2020n001930.hdf
#               created  : 2020-02-03T15:27:45
# #         Qz             RQz              sR              sQ
1.03563296e-02  1.08100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.06430511e+01  8.89252719e+00  5.33586471e-05
...

A much reduced header:

# # ORSO reflectivity data file | 0.1 standard | YAML encoding | https://www.reflectometry.org/
# creator:
#     name        : G. User
#     affiliation : PSI
#     time        : 2020-04-06T13:21:18
#     computer    : lnsa17.psi.ch
# data source:
#     owner:
#         name: T. Proposer
#         affiliation: The Institute (TI)
#     experiment:
#         facility: Paul Scherrer Institut, SINQ
#         ID: 2020 0304
#         date: 2021-05-12 - 2021-05-15
#         title: Generation of input for formatting purposes
#         instrument: Amor
#         probe: neutrons
#     sample:
#         name: Ni1000
#     measurement:
#         scheme: angle- and energy-dispersive
#         instrument_settings:
#             incident_angle:
#                  unit: deg
#                  min: 0.4
#                  max: 1.0
#             wavelength: 
#                  unit: angstrom
#                  min: 3.0 
#                  max: 12.5
#             polarisation: +
#         data_files:
#             - file     : amor2020n001925.hdf
#               created  : 2020-02-03T14:27:45
#             - file     : amor2020n001926.hdf
#               created  : 2020-02-03T14:37:15
#             - file     : amor2020n001927.hdf
#               created  : 2020-02-03T14:27:02
#         references:
#             - file     : amor2020n001064.hdf
#               created  : 2020-02-02T15:38:17
# reduction:
#     software: eos.py
#     call : eos.py -Y 2020 -n 1925-1927 -y 9,55 ni1000 -O -0.2 -r 1064 -s 1 -i -a 0.005 -e
# columns:
#     - {name: Qz, unit: 1/angstrom, dimension: WW transfer}
#     - {name: R, dimension: reflectivity}
#     - {name: sR, dimension: error-reflectivity}
#     - {name: sQz, unit: 1/angstrom, dimension: resolution-WW transfer}
# data set: spin_up
# #         Qz             RQz              sR              sQ
1.03563296e-02  3.88100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.16430511e+01  8.89252719e+00  5.33586471e-05
...

# data_set: spin_dn
# data_source:
#     measurement:
#         instrument_settings:
#             polarisation: -
#     input_files:
#         data_files:
#             - file     : amor2020n001930.hdf
#               created  : 2020-02-03T15:27:45
# #         Qz             RQz              sR              sQ
1.03563296e-02  1.08100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.06430511e+01  8.89252719e+00  5.33586471e-05
...

from example towards dictionary

legend:
1_ mandatory (according to the principles)
2_ mandatory if applicable (e.g. proposal ID for large scale facilities)
3_ recommended
4_ optional
_1 easy to realize
_2 probably needs some programming
_3 probably needs some manual work (operator’s name on the lab x-ray machine)

11# # ORSO reflectivity data file | 0.1 standard | YAML encoding | https://www.reflectometry.org/  
13# creator:            this identifies the person or 
                        program who created this file
13#    name:         
13#    affiliation:   
11#    time:            date and time of file creation,
                        format: yyyy-mm-ddThh:mm:ss 
11#    computer:   
12# data source:        This information should be 
                        provided in the raw data file
                        If not, one has to find ways to
                        provide it.
12#    owner:           This referes to the actual owner
                        of the data set, i.e. the main
                        proposer or the person doing the 
                        measurement on a lab reflectometer
12#       name:
12#       affiliation:
12#   experiment:  
22#       facility:
22#       ID:           proposal ID
12#       date:         format: yyyy-mm-dd
22#       title:        proposal or project title
12#       instrument:   
12#       probe:        neutrons or x-rays
12#   sample:  
12#       name:         
32#       composition:  free text notes on the nominal
                        composition of the sample
12#   measurement:  
22#       scheme:       one of
                        angle- and energy-dispersive 
                        angle-dispersive
                        energy-dispersive 
12#       instrument_settings:  
22#           incident_angle:  
                  unit: rad / deg  
                  min:  
                  max:   
22#           wavelength:
                  unit: nm / angstrom  
                  min: 
                  max: 
              polarisation: one of + / - / ++ / +- / -+ / --  
11#       data_files:  
11#           - file:   file name or identifier (doi)
12#             created: yyyy-mm-ddThh:mm:ss  
21#       references:  
21#           - file:   file name or identifier (doi)
22#             created: yyyy-mm-ddThh:mm:ss
22# reduction:  
23#      software:      name and version of the reduction software
23#      call:          command line call or similar
11# columns:            the 4 leading columns must follow
                        the scheme for "I(x)":
                          x   with units
                          I   with units if applicable
                          sigma of I   with units (if applicable)
                          sigma of resolution of x   with units 
                        Further columns can be of any type,
                        content and order. But always with
                        descriptioon and units.
11#     - name: Qz
11#       unit:         1/angstrom / 1/nm
31#       dimension: WW transfer
11#     - name: R
31#       dimension: reflectivity
11#     - name: sR 
31#       dimension: error-reflectivity
11#     - name: sQz
11#       unit:         1/angstrom / 1/nm
31#       dimension: resolution-WW transfer
41#  data set:          optional, to provide a name / 
                        identifier for a multy data set
                        file 
41#  #         Qz             RQz              sR              sQ
1.03563296e-02  3.88100068e+00  4.33909068e+00  5.17816478e-05
1.06717294e-02  1.16430511e+01  8.89252719e+00  5.33586471e-05
...
                       rectangular matrix
                       all entries of the same data tyle