# Preparing experiment input files¶

The first step in using Reveal Chromatography to model proteins is to load in a user’s experimental data. Completing this step fully and accurately is critical to successfully using Reveal Chromatography for modeling.

Reveal Chromatography includes a template input file (using the Excel format) that gives users an example of the way that experimental data must be formatted to be loaded into Reveal. This template must be modified to reflect the details from their own study, including its experiments and protein. Within this file, users can incorporate both continuous data (such as a file from an AKTA system), as well as fraction data.

## The Excel template¶

The templated spreadsheet that is shipped with Reveal Chromatography is an Excel file that contains two sections.

The top portion of this file contains a description of the elements involved in the user’s study. These sections contain details regarding the general system, column, load, and buffer or buffers used in the study’s experiments. These elements often refer to elements that exist within the user data (e.g., chemicals, products, column types, etc.) but describe how they are used specifically in this study (specific amount of a given chemical, a specific bed height in a column, etc.). Upon loading, these elements will show up in the study data as building blocks of the study’s experiments and simulations.

The second portion of the file describes the experimental processes (method, collection criteria, fraction data, and continuous data). In this portion of the file, users should describe each of their experiments in a separate Excel column, beginning with column D.

To begin loading data into Reveal Chromatography, open the Help menu and select Show sample input files:

Then, open the file Example_Gradient_Elution_Study.xlsx and should look like this:

It is recommended that users immediately make a copy of the template and save it to their home directory to avoid altering the original template and find the input file more easily. If needed, users can also download the sample input file again at any time here.

## Entering data¶

It is critical that users enter data into this spreadsheet in the most complete and accurate manner possible. This section of the user manual will provide guidance on how to modify the input file and when relevant, will provide information on how the parameters requested will be used by Reveal Chromatography.

Column A contains the section names for the input file. Column B contains the name of the quantities, parameters, and information that are being requested. Column C contains guidelines for how to enter the requested information (list of choices, unit, etc.). Columns D and onward provide space to enter the actual values that Reveal Chromatography will load.

There are four types of input that a user will be asked to provide:

1. Free text,
2. Text to select from a list of options (in column C),
3. Unitted values (the unit will be listed in column C),
4. Text to select from the User Data entries.

Users may always refer to the legend at the top of the input spreadsheet to remind themselves of what a field requires.

### Free text¶

If a field is specified as free text, it means that any text may be entered in those fields. These fields have no highlighting. If the field is designated as optional, no text is required, though it is suggested that users fill out the input file completely.

### Pick from list¶

Some fields in the template are described as “Pick list from User Data”. These fields are highlighted in orange. As the name implies, the text entered in this field must correspond with an option that is pre-existing in the User Data. For example, the input file uses PROD000 as its target product. As such, this product must exist in the User Data prior to loading the input file. PROD000 is shipped with the software, which can be confirmed by expanding the list of known products (see The User Data browser if you are unfamiliar with the User Data browser):

It is assumed that many of these elements will be used in multiple different studies. To avoid burdening users with the need to define these elements in each input file, they are instead defined once in the User Data, and may be referenced at any point after their creation.

Note

If the option for a certain Pick-list field does not already exist, users will need to create it in Reveal Chromatography prior to loading the experimental input file. To do so, return to the application and right-click on the folder in the user data browser where a new element is to be made, and select Create new. More details on this process can be found in Contributing new data elements.

### Unitted values¶

Some fields require users to input a numerical, or unitted value. When this is the case, the field is highlighted in green. Units of measurement are currently fixed, and changing units of measurement is not yet supported by Reveal Chromatography.

### Text option from a list of options¶

Other fields require the user to specify text from a list of options that is provided to them. When this is the case, these fields are highlighted in yellow, and the list of text input options is specified in column C.

## The contents of the input file¶

The input file for Reveal Chromatography is broken into ten sections:

### General study information¶

This section of the spreadsheet is where users may record general information about their study. Much of the data there is used solely for administrative purposes, except the Product Name field, which must correspond to an existing product within the User Data.

Note

More detailed instructions on creating new products are available in Contributing a new product.

The “Study Name” field should contain a readable name for the study. The “Study ID” field is meant to be unique across all studies. It may be generated by the user’s lab or an Electronic Notebook system. The “Study Type” field should specify what kind of study the user is performing (e.g., model calibration, pulse injection, varying pH, etc.). “Study Purpose”, “Site”, and “Experimentalist” are descriptive fields that are also used largely for organizational purposes to allow search/filter tools to search through studies. Finally, “Column Placement” typically refers to where, in the overall sequence of unit operations, or chromatography steps, the current study resides.

### System information¶

This portion of the spreadsheet allows a user to define the type of system they are utilizing in their chromatography (for e.g., the type of AKTA machine being used). It is important to correctly define the system being used, as different systems will have different tubing and hardware configurations, minimum/maximum flow rates, flow path cell length, etc. These differences in system specifications will affect other data specifications (for example, holdup volumes).

The “System Type” field is where the user can specify the type of AKTA machine that was used, and must be one of the system types listed in the User Data. The “System Name” field is another administrative field where users may name the specific machine that was used (if there are multiple of the same type) while the “System ID” field may set by an ElN or lab system.

Cells regarding hold-up volumes are critical. Hold up volumes affect experimental data by delaying the elution time. However, these volumes are absent when building a simulation, so they must be specified precisely in the spreadsheet or users risk being unable to align simulation with experiments. “Hold Up Volume - Pump to injection loop” refers to the volume from the start of the gradient conditions to the injection valve. This field will often vary depending on the method. “Hold Up Volume - injection Loop to Column” is the volume from the injection valve to the column top. It consists of the volume from the injection valve to the column valve plus the volume from the column valve to the column inlet line. These account for the hold up volume upstream of the column. “Hold Up Volume - Column to Abs Detector” refers to volume from the column bottom to the UV monitor. It consists of the combined volume contained in the column outlet line to the column valve plus the volume from the column valve to the UV monitor.

The “Absorbance Detector Path Length” refers to the distance that the detector light must travel for a specific system or detector. This field allows Reveal Chromatography to make data across different systems comparable, as it is used to convert all continuous and fraction data from Absorbance Units (AUs) to Absorbance Units per centimeter of detector path (AU/cm), the unit that Reveal Chromatography uses to display all absorption data.

### Column information¶

Specifications regarding both the column and the resin being used in the study are to be filled out in this section.

The “Packed Column Lot ID” field is an organizational field that allows users to track the column about which they are entering data. The “Resin Type” field allows users to specify the type of resin in the column, and the “Resin Lot ID” field allows users to define the specific lot from which the resin was drawn. The Column Model field allows users to specify a particular type of column and its corresponding specs. “Column Description” is a free text field for the user’s discretion.

“Packed Column Bed Height” refers to the height of the resin once the column has been filled and packed. “Compression Factor” refers to the settled volume of the resin divided by the volume of the resin once it is packed in the column (therefore always greater than 1).

“HETP” (Height Equivalent to a Theoretical Plane”) is a measure of the separation power of a column, determined by dividing they bed height by the number of “plates” in a column. For a given column, HETP is a way to evaluate the column packing efficiency, by producing an acceptance criteria around HETP for each pack (<0.01cm, for example).

“Asymmetry” is another measure of the quality of column packing, that is associated with HETP. Ideal asymmetry is 1.0, with typical values falling in the 0.7 to 1.4 range.

The “load information” portion of the spreadsheet is used to specify details about the composition of one or more loads. Because experiments may involve different loads, it is possible to define different loads, each in their own Excel column, as long as they have distinct names (the example input file contains two loads).

The load’s assay concentration section should contain a concentration for each assay result and therefore, the assay result names in the B column must match the assays listed in the User Data’s product definition. Finally, the chemical components of the load used in the load matrix must also be defined in the User Data.

“Load Name”, “Load Source”, and “Load Lot ID” are all administrative fields meant to help users track the details of their loads, their identity, and their origin. Specific physical and chemical properties of the load should be entered in the fields associated with load product concentration, density, conductivity, pH, and temperature (in degrees Celsius).

Lastly, the load matrix section should be populated with the details of the specific chemical component (s) that make up a solution contained within a load.

### Buffer information¶

Information about the different buffers used in the different method steps of the experiments can be specified in this section. Precisely tracking buffer compositions and volumes is another one of the critical pieces of information for modeling processes successfully. It is essential to have an accurate description of the ion content during the chromatography process, and allows model equations to use the correct ion content.

As for the load information section, a separate column must be used for each buffer. The sample input file contains three columns, though more or fewer may be specified by adding columns or clearing the contents of existing columns. Again, chemicals used in the solid and liquid additions of the buffer must have corresponding entries in the User Data.

“Buffer Name”, “Buffer Source”, “Buffer Description”, and “Buffer Lot ID” are all free text fields that allow users to describe and track the nature and origin of the buffer being described. Physical and chemical properties of each buffer will be used by Reveal to characterize the properties of the method step using that buffer. Some of these properties (density and volume) are used to compute the exact ion composition that will flow in the column. Others (temperature and pH) have an impact on the binding process and can be modeled.

Finally, the buffer composition is split between its solid and liquid components. Users may add or subtract rows to accommodate more or fewer chemicals.

### Method information¶

The method information section is where one can start describing the experiments contained in the study. Since an experiment is characterized by its method, the method’s name also gives its name to the experiment that will be built around it.

Multiple specifications must be made within the method information section of the sample input file. Each method is written in its own Excel column and should have a unique name. Within each method, each step requires its own unique name. The steps do not need to have the same name across all experiment, though users will see that it is the case in the sample input file. Again, columns may be added or cleared from the file as needed to accommodate as many or as few methods/experiments as the user desires.

Much of the information in this portion of the input file will need to match one of the options listed in column C (fields highlighted in yellow). Additionally, some fields will need to match information that was previously specified in the same input file (i.e., buffer names).

“Method and Experiment Name” is a free text field where a user can uniquely name both the method and experiment built from it. The “Run Type” field allows users to select from one of six types of methods (or run) that they are describing in that column.

Below is where users can specify the succession of method steps all defined by a name, a type, and the characteristics of the solution(s) the step adds to the column.

Each “Step Name” field is free text, but the “Step Type” field requires users to enter one of then known types. The “Buffer Name” field must be filled with a name that matches one of the buffer names specified in the Buffer information section. The volume and flow rate of each step must also be specified by the user and are used to calculate the duration of the step, its pH, as well as to build the ion composition during that step and simulate the conductivity, and absorbance during that phase.

### Collection criteria information¶

Experiments may involve the collection and analysis of a pool. These criteria are then used to compute pool properties, and therefore allow to compute the performance of the experiment and the simulations built from it.

“Do collect?” refers to whether or not a user has collected information about a pool. If a user chooses “No”, the rest of the fields may be left blank or even erased.

Otherwise, the user should specify the step in the method during which the pool is created (typically an elution step or the load when a “flow-through” type experiment is specified).

“Start Collect Type” asks users to define whether their collection begins at a certain fixed absorbance or at a percent of the tallest peak’s maximum absorbance. This same logic applies to the “Stop Collect Type” field.

The “Start Collect Target” and “Stop Collect Target” fields are used to provide the unitted value that corresponds with the collect type specified above. When choosing fixed absorbance, users should enter a target value in AU/cm. When choosing percent peak maximum, users should enter a percentage value of the peak at which they would like to start or stop collection.

If choosing percent peak maximum and assuming a single, large elution peak, and assuming a specified target start collection value, this value will be reached twice. Once while ascending to the peak, and once while descending. Therefore, two more parameters allow users to specify which value triggers the start of collection: “Start Collect While” and “Stop Collect While”, which can take values Ascending and Descending.

Assuming the user is trying to collect the component responsible for the main elution peak, the “Start Collect While” field is often set to Ascending, and the “Stop Collect While” field is generally set to Descending. In other more complex situations, users may wish to exclude the main peak component(s), and both parameters may take the same value (both Ascending or both Descending if the component to collect peaks before or after the main peak respectively).

Note

This assumes that the elution step roughly generates mostly one peak. In complex multi-peak situations, users should contact the Reveal team to devise and implement a custom collection strategy.

Note

Upon loading, the collection criteria information is collected together with the method information and can be viewed or modified in the method’s view.

### Performance parameters¶

If there is a collection of a pool in a given experiment (see Collection criteria information), this optional section allows a user to enter the observed performance parameters of the collected pool. This allows users to compare their simulated performances to experimental ones.

The “Measured?” field asks users to specify if they did or did not measure certain performance parameters. If a user enters “No” for this field, the remaining fields in the performance parameters section may be left blank or even deleted.

“Pool volume” requires users to enter the volume of the pool produced in the column volume (CV) unit. “Step Yield” refers to the percentage of mass recovered in the pool compared to the mass that was loaded into the column. “Pool Concentration” is the mass of product in the pool divided by the pool volume. Conductivity and pH of the pool should also be specified in the appropriate fields and are there for completeness purposes. They are not currently used by Reveal Chromatography.

The assay results for each product component in the pool are used to compute the pool purity for each component. This can again be used by Reveal Chromatography to compare simulations to experimental results, and validate simulation outputs. The component names listed there must match those defined in the User Data and those specified in the Load information.

### Fraction data¶

Fraction tables contain data that allows Reveal Chromatography to translate analytical information into a time series of product component concentrations over time. This data is optional but needed to use the optimizer (see Parameter optimizers) for products with multiple components, as the optimizer uses that information to compare simulations to experiments on a per component basis.

The first cell allows the user to specify if fraction have been collected. If “Yes” was selected, the second cell is the name of the Excel tab which contains the faction data for each experiment.

To enter fraction data, select the Run 1 Fracs tab at the bottom of the spreadsheet, modify the list of product components to the ones from the product definition, and then, manually enter all fraction data that has been measured by the analytical team.

To specify fraction data for additional runs/experiments, create a copy of the Run 1 Fracs tab. A new sheet will be created. Rename this sheet as appropriate (e.g., Run 2 Fracs).

Fraction data currently must be entered in the units specified in the column headings of the table (i.e., minutes, g/L, and %). Also, times for fraction data must use the same time origin as continuous data and the method description.

As many fractions as possible should be collected, particularly covering the elution peak as this information is essential to help the optimizer calibrate models for each component accurately.

### Continuous data¶

Continuous data for all experiments, exported from chromatography systems such as AKTA machines, can be specified in the final section of the input file. Continuous data file paths are expected to be specified as relative paths compared to the input Excel spreadsheet. This means that if a user’s files are all in the same folder, only the file name needs to be specified.

The data contained in the continuous data file should have the time axis specified in minutes, and should include timeseries for absorbance, pH, temperature, and conductivity. The continuous data time origin should be the same as the method description and the fraction data. See the template Example_AKTA_Data.asc, which was exported from an AKTA explorer.

If no continuous data is available for a given experiment, the user should specify N/A in that field.