Formal aspects of the ORSO ASCII represenation of data sets

On Monday, 2021-03-22, some orso members met online to discuss and if possible agree on a number of tasks concerning formal aspects of the ASCII representation of the orso reflectivity data set.

This working group included: Artur Glavic, Brian Maranville, Anrew McCluskey, Max Skoda and Jochen Stahn.

In the following the tasks are listed together with the mohst importand arguments and if applicabla, the decission we made.

Structure of the file

The ASCII (or text) representation of the data has a wrapped YAML structure (or the analogon in JSON), where the meta data are organised according to YAML rules in the header, but each header line is prefixed with # (including space to be comliant with numpy.savetxt).

General organisation of the header

Based on the discussion of where to put the units (see next topic) we decided to stick to the YAML or JSON rules. This allows for simpler data reading and writing because standard routines can be used.

The drawback is that the header gets a bit less human readable, but we gain much more flexibility to add further entries.

The actual format (YAML, JSON) is stated in the first header line. Artur will make a suggsetion for this.

Where to put units

The question was to ballance human and computer readability:

#    <quantity>:  <magnitude> <unit>

(compact, very close to the real-world formula) versus

#    <quantity>:
#         magnitude: <magnitude>
#         unit: <unit>

(fits standard, allows for extensions like resolution:, min:, …)

The discussion took quite long and we agreed several times on one or the other. The latter, strict version finally made it because it strongly simplifies the programming of writers and readers.

How to write units

The discussion was about how to write the units and if several variants would be allowed. As long as SI unis are used things seem straight forward, but associated units or those with non-ASCII spelling are not.

We agreed on

only one version per unit;
Aangstroem is spelled angstrom, a capital A collides with Ampere (which might show up in battery measurements);
only ASCII symbols are to be used;
reciprocal units are written e.g. as 1/angstrom;
exponents are marked with ** as e.g. in 1/angstrom**2 for SLD;
the base units are rad, deg, m, mm, nm, … , angstrom and s.

Unsolved it the usage of $\mu$ as in micro meter.

Comments

The question was how we should include comments in the header.

We agreed on 2 types:

A comment to be used by some software (to be forwarded or displayed) starts with the key word comment: which can be used on any level of the hirarchy.

example:
```
comment: |
  Attention! These data have been collected without the frame overlap mirror
  and contain thus some highly structured background.
```
Everything behind a # in the header after the initial ^# will be ignored. This allows to write comments in the header which will be ignored by any software.

This might be usefull
- to mark a hand-made alteration, e.g. when the autometically entered value was wrong:
```
#     #omega: -.23
#     # omega zero-position was not reset prior to measurement
#     omega: 0.43
```
- to enhance readability, e.g. an additional column description line of the type
```
# #      Qz       RQz      sRqz     sQ        lambda       theta
```
please comment on this

Column description

The column description in Jochen’s example contained the key-value pair column: <number> which was not approved by most of the participants. We agreed to use a number-less list instead of the type:

# columns:
#     - {quantity: Qz, unit: 1/angstrom}
#     - {quantity: RQz, unit:1}
#     - {quantity: sR, unit:1}
#     - {quantity: aQ, unit: 1/angstrom, comment: 'sigma of Gaussian resolution function'}
#     - {quantity: lambda, unit: angstrom}

The keyword quantity: might be changed. Suggestions?

Steepness of hirarchies

We choose between the hierarchical approach:

# wavelength:
#     value:           
#     min:             
#     max:             
#     resolution:
#         type: <    constant|proportional|...>
#         sigma:
#     unit:

which allows for a compact notation:

# wavelength:{min:4, max:12, resolution:{type:constant, sigma:0.1}, unit: nm }

and the flat approach:

# wavelength:                        
# wavelength_min:                    
# wavelength_max:                    
# wavelength_resolution_constant:    
# wavelength_resolution_proportional:
# wavelength_unit:                   

The hirarchcal one did win easily.

Multiple data sets in one file

The question was whether it should be possible to put several data sets in one file, or not. Typical example is a set of related data with different spin states, but also series of measureemnts or combinted angle- and wavelength-dependent data might be sorted this way.

We soon agreed to allow for multiple data sets with the following rules:

For one data set, nothing special needs to be written to the header.
Each additional data set has to start with the line
```
# data_set: <identifier>
```
where <identifier> might be a unique name or number. The default is that the first additional data set starts with number 1.
For each additional data set the header of data set 0 applies if not explicitely overwritten.
It is possible to overwrite the header information for any additional data set by inserting the corresponding lines with leading # in between the separator and the data columns.

To be discussed:

All additional data sets can be automatically named using the identifire. But there is non for the data set 0.
gnuplot wants an empty line as a separator for 3 dimensional plotting. Is there a problem when allowing for an optional empty line?

Naming for spin states

After a very short discussion (all versions will be problematic) we agreed on

# spin_state: + for spin up polarisation, no spin analysis (or vice versa)
# spin_state: - for spin down polarisation, no spin analysis (or vice versa)
# spin_state: ++ for spin up polarisation, spin up analysis
# spin_state: +- for spin up polarisation, spin down analysis
# spin_state: -- for spin down polarisation, spin down analysis
# spin_state: -+ for spin down polarisation, spin up analysis
to be discussed: symbol for no spin polarisation

File extentions

Based on Artur’s suggestions for file extensions we selected:

.ort = open reflectometry text for ASCII or UTF files
.orb = open reflecrometry binary for binary files
.orm = open reflectometry model for model description

Artur checked that these extensions are not in use elsewhere.

ORSO file format - meeting on 2021-03-22