Using Reveal Chromatography’s API

This document is intended to provide a quick tutorial about how to use the kromatography package for writing custom scripts and exploring data in a custom way. It is expected for you to follow along this tutorial inside an IPython console that is kromatography aware. See Reveal’s Python console to launch it from the Reveal Chromatography interface.

Note that in this document, example inputs are preceded by >>> to symbolize the python prompt, and are not to be typed in. They are there to distinguish inputs to be typed by the user from the outputs returned by the console.

Content:

Getting started

First, importing the top level package, and checking the version you are running is a good first step:

>>> from kromatography import version_info
>>> print(version_info)
version: 0.7.2, build: 1

It may also be good to clarify the versions of a few other key dependencies, just to make sure we know which documentation to look things up in:

>>> from numpy import __version__
>>> __version__
'1.11.3'
>>> from pandas import __version__
>>> __version__
'0.19.2'
>>> from traits import __version__
>>> __version__
'4.5.0'

Load some data from an experimental study file

Assuming that you have copied the tutorial data to your Desktop folder, let’s try to load it in a script, and explore its experimental data. First we need to move to the Desktop directory:

>>> %cd
>>> %cd Desktop

Then we can use the kromatography.io subpackage to load the study and list its experiments:

>>> from kromatography.io.api import load_study_from_excel
>>> study = load_study_from_excel("Example_Gradient_Elution_Study.xlsx")
>>> print([exp.name for exp in study.experiments])
['Run_3', 'Run_2', 'Run_1']

If the Example_Experiment_Template.xlsx is in a different location, the above load_study_from_excel command can be modified to provide the full path to the file.

Beyond the experiments’ names, one can explore any of their properties. For example, we can make sure all experiments are about the PROD000 product:

>>> [exp.product.name for exp in study.experiments]
['Prod000', 'Prod000', 'Prod000']

Or we can check the properties of the resins used in the column:

>>> exp0 = study.experiments[0]
>>> exp0.column.resin.print_traits()
_unique_keys:          ('name', 'lot_id', 'resin_type', 'type_id')
average_bead_diameter: UnitScalar(90.0, units='1e-06*m')
<snip>

The output of the print_traits() is the list of attributes (traits) of the Resin object and their values, and that method is available on pretty much any of the classes defined in the kromatography package. The reader is encouraged to leverage the introspection capability of the Python console to explore what an Experiment is made of, and what the values of a few parameter of interest are.

Load a project file and view one simulation’s data

Loading studies from a saved project file are equally simple to implement. Again, to make it easy to find an example .chrom file, if one isn’t available, open the tutorial data folder using the Help menu, and copy the example chrom file to your desktop. Then, this time we will move to the desktop location before loading the file:

>>> from kromatography.io.api import load_study_from_project_file
>>> study = load_study_from_project_file("Example_Gradient_Elution_Project.chrom")
>>> print([sim.name for sim in study.simulations])

The last command prints the list of names of all the simulations present in the study. Depending on the state of the project file that was loaded, the output may differ.

Let’s dive into one of these simulations deeper, to look at its components and its chromatogram data. First, we will grab the first simulation (note that Python starts counting at 0) and look at its complete content:

>>> sim0 = study.simulations[0]
>>> sim0.print_traits()

The output of the print_traits() is the list of attributes (traits) of the Simulation object and their values, and that method is available on pretty much any of the classes defined in the kromatography package. We see from that list that a(ny) simulation has two attributes that are relevant to what we want to do: binding_model and transport_model. Let’s dive into the binding model, to see for example the values of the SMA nu parameter for each component:

>>> sim0.binding_model.print_traits()
>>> sim0.binding_model.sma_nu
>>> # Let's print these values together with their component names:
>>> zip(sim0.binding_model.component_names, sim0.binding_model.sma_nu)

Let’s do the same with the pore diffusions in the transport model:

>>> sim0.transport_model.print_traits()
>>> sim0.transport_model.pore_diffusion
>>> # Let's print these values together with their component names:
>>> zip(sim0.transport_model.component_names, sim0.transport_model.pore_diffusion)

Finally, let’s plot the simulation data using Matplotlib, an interactive plotting library, the data being contained in the output:

>>> from kromatography.plotting.api import plot_chromatogram
>>> plot_chromatogram?
Signature: plot_chromatogram(expt, sim)
Docstring:
Build maplotlib plot to display experimental chromatogram and
<snip>
>>> ref_exp = study.search_experiment_by_name("Run_1")
>>> plot_chromatogram(ref_exp, sim0)

This should generate a plot of the simulation’s UV chromatogram as well as the simulated cation concentration together with the experimental data the simulation was built from.

Explore all products available in the User Data

Let’s say that you would like to programmatically explore the products you currently have in your datasource, without having to click through them all, for example to find the name of all products of type Globular. Let’s first load the User Data (internally called a datasource):

>>> from kromatography.utils.api import load_default_user_datasource
>>> default_user_data = load_default_user_datasource()
>>> print(default_user_data)
(<kromatography.model.data_source.SimpleDataSource object at 0x11c2b0e30>,
 'C:\Users\<snip>_2016-12-15-08-28-07.chromds')

That function returns a tuple, the first element of which is the actual DataSource object which contains the User Data. Let’s grab it and list all products available, and then all product names available:

>>> datasource = default_user_data[0]
>>> datasource.products  # Not very enlighting...
[<kromatography.model.product.Product at 0x11c483c50>, <snip>]
>>> [prod.name for prod in datasource.products]
['Prod000', <snip>]

If we wanted to see the names and types of all products, we can ask for both together:

>>> [(prod.name, prod.product_type) for prod in datasource.products]
[('Prod000', 'Globular'), <snip>]

To answer the original question then, we can build a list of the product names only if the product_type is Fab:

>>> [prod.name for prod in datasource.products if prod.product_type == 'Globular']
['Prod000', <snip>]

Build and run a new simulation and save it in a study

Build the simulation

Like in the above examples, let’s load a study from the tutorial data:

>>> from kromatography.io.api import load_study_from_excel
>>> study = load_study_from_excel("Example_Gradient_Elution_Study.xlsx")
>>> print([exp.name for exp in study.experiments])
['Run_3', 'Run_2', 'Run_1']

We can build a new simulation from experiment, as done in the UI, by using the dedicated utility:

>>> from kromatography.model.factories.api import build_simulation_from_experiment
>>> build_simulation_from_experiment?
Signature: build_simulation_from_experiment(experiment, binding_model=None, transport_model=None, name='', fstep='Load', lstep='Strip', initial_buffer=None, lazy_loading=False)
Docstring:
Build a Simulation object given an experiment and some models.
<snip>
>>> exp = study.search_experiment_by_name("Run_1")
>>> sim = build_simulation_from_experiment(exp)

Since no specific binding and transport models are passed to create the simulation, it is created with default models. That may not be what one want, but it might be easier to some users to modify them after the fact. For example,:

>>> sim.binding_model.sma_lambda
646.0
>>> sim.binding_model.sma_lambda = 492

Once the model has been modified, one can check that the model has been modified, or one can explore what other parameters can be modified using the :meth:print_traits method:

>>> sim.binding_model.print_traits()
_cadet_input_keys: ['sma_lambda', 'sma_nu', 'is_..._sigma', 'sma_kd', 'sma_ka']
_unique_keys:      ('target_product', 'name')
component_names:   ['component0', 'component1', 'component2', 'component3']
editable:          True
is_kinetic:        0
metadata:          {'name': 'Default SMA model'}
model_type:        'STERIC MASS ACTION'
name:              'Default SMA model'
num_comp:          4
sma_ka:            array([ 0.   ,  0.001,  0.001,  0.001])
sma_kd:            array([ 0.,  1.,  1.,  1.])
sma_lambda:        492.0
sma_nu:            array([ 0.,  5.,  5.,  5.])
sma_sigma:         array([ 0.,  5.,  5.,  5.])
target_product:    ''
type_id:           'BindingModel'
unique_id:         {'target_product': '', 'name': 'Default SMA model'}
uuid:              UUID('0d0c7513-58a6-4138-b8ea-4f7a29f06b6a')

Run CADET on the simulation

Once the simulation is ready, we can run it by creating a job manager, and passing it to the run method of the simulation object:

>>> from kromatography.model.factories.api import create_start_job_manager
>>> job_mgr = create_start_job_manager()
>>> sim.run(job_mgr, wait=True)

Note

It is important to remember to avoid creating multiple job managers to avoid overloading the operating system. The same job manager should be used for any number of run calls, the new simulations to run being added to the job manager’s queue of work items. For more details, please review the documentation for the kromatography.ets_future.encore.simple_async_job_manager.SimpleAsyncJobManager class.

Assuming that we don’t need to run any other simulation (for now), we can clean the resources used by the job manager by calling its shutdown() method:

>>> job_mgr.shutdown()

Some advanced users may prefer to get Reveal to just created the CADET input file, and run CADET separately rather than launching the run CADET from Reveal. For these users, we have exposed the CADET file creation in the Simulation class, the method returning the resulting CADET file path:

>>> sim.create_cadet_input_file()
'C:\Users\jrocher\AppData\Roaming\Reveal\Chromatography\cadet_input_files\669a2667-9cc7-4945-84c9-e64cb6a50bf8_cadet.h5'

Add the simulation to study and plot its data

Now that the simulation has been run, we can add it to the study, and save the study to a new project file:

>>> study.simulations.append(sim)
>>> from kromatography.io.api import save_study_to_project_file
>>> save_study_to_project_file("test.chrom", study)

Note that if the study was saved to disk just to be loaded in the UI (for example to visualize the new simulation’s chromatogram), one can just create a new application window around the new study:

>>> from kromatography.utils.api import launch_app_for_study
>>> launch_app_for_study(study)

Finally, one may want to plot simulation’s data manually using Matplotlib, one can either use an existing API function:

>>> from kromatography.plotting.api import plot_chromatogram
>>> plot_chromatogram(exp, sim)

or drill into the simulation’s data to find the numpy arrays containing the desired timeseries:

>>> x = sim.output.continuous_data['Acidic_1_Sim'].x_data
>>> y = sim.output.continuous_data['Acidic_1_Sim'].y_data
>>> %pylab
>>> plot(x, y)
>>> ylabel("Absorbance (AU/cm)")
>>> xlabel("Time (minutes)")

This should create a plotting window with the simulation plotted inside. That plot can be modified with any number of Matplotlib commands to make it suitable for the user’s need, including calling plot() multiple times to superimpose other timeseries. More details in the Matplotlib documentation.

Build and run a simulation grid and plot performances

Let’s assume that we have calibrated a model, and that we would like to programmatically build a simulation grid, run it and plot its performances (yield, puritiers, pool volume, ...). Let’s first load a simulation and build a Simulation group around it:

>>> %cd Desktop
>>> from kromatography.io.api import load_study_from_project_file
>>> study = load_study_from_project_file("Example_Gradient_Elution_Project.chrom")
>>> sim = study.search_simulation_by_name("Calibrated")

>>> from kromatography.compute.factories.api import build_simulation_grid
>>> param1 = "column.bed_height_actual"
>>> param2 = "method.method_steps[0].volume"
>>> grid = build_simulation_grid(sim, [param1, param2])

The simulation grid that was built with that command will explore a couple of operational parameters that might effect the performances of the process, and that exploration is a typical analysis that may be done once a model has been calibrated, as part of the process characterization and process optimization.

The parameter names chosen, column.bed_height_actual and method.steps[0].volume, respectively the bed height and the load volume. If you want to change these parameters, you can pick any parameter that a simulation contains. That list of parameters can be viewed in the UI by launching the Parameter Explorer, and selecting a parameter name. Or they can be discovered interactively in the Python console, using introspection (aka the TAB key) or the print_traits() method:

>>> sim.method.print_traits()
<snip>
method_steps: [<kromatography.model.meth...ep object at 0x130969890>]
<snip>
>>> sim.method.method_steps
[<kromatography.model.method_step.MethodStep at 0x13095f890>,
 <kromatography.model.method_step.MethodStep at 0x13095f8f0>,
 <kromatography.model.method_step.MethodStep at 0x13095fa10>,
 <kromatography.model.method_step.MethodStep at 0x130969890>]
>>> sim.method.method_steps[0].print_traits()
<snip>
volume: UnitScalar(4.6913580246913575, units='1.0')
<snip>
>>> print(sim.method.method_steps[0].volume)
4.69135802469

As a conclusion, "method.method_steps[0].volume" is a valid path to a (unitted) scalar, and is therefore a parameter that can be scanned in a simulation grid. The same can be done with the bed height:

>>> sim.column.bed_height_actual
UnitScalar(26.01, units='0.01*m')

Now that the simulation grid has been created, we can check some of its attribute:

>>> grid.run_status
'Created'
>>> grid.percent_run
'0.00 %'
>>> grid.size
400

Now, to run the simulation grid, we can just invoke its run method:

>>> grid.run?
<snip>

As can be seen from the help of the run() method on the grid object, a JobManager needs to be created and initialized. That can be done using a dedicated utility and passing that job manager to the run method:

>>> from kromatography.model.factories.api import create_start_job_manager
>>> job_mgr = create_start_job_manager()
>>> grid.run(job_mgr, wait=True)

Note

It is important to remember to avoid creating multiple job managers to avoid overloading the operating system. The same job manager can be used for any number of run calls, the new simulations to run being added to the job manager’s queue of work items. For more details, please review the documentation for the kromatography.ets_future.encore.simple_async_job_manager.SimpleAsyncJobManager class.

The parameter wait=True in the run() call will make the Python console wait until the grid has finished running to return and accept another command. Once it does, we can check that the grid has run completely:

>>> grid.run_status
'Finished running'
>>> grid.percent_run
'100.00 %'

Once a SimulationGroup has run, its data is contained in its group_data attribute, where are listed, for each combination of scanned parameters, the name of the simulation created and run, the value for each scanned parameter and the value for each predicted performance: yield, purities for each component, pool volume, and pool concentration:

>>> grid.group_data
   Simulation name column.bed_height_actual method.method_steps[0].volume  \
0            Sim 0                  19.5075                 3.51851851852
1            Sim 1                  19.5075                  4.1049382716
<snip>

The object is a common pandas.DataFrame, and one can learn about them in the Pandas documentation. To print all the performances as a function of both scanned parameters, that table of data needs to be pivoted, to make a separate grid for each performance, so that it can be plotted as a heatmap using a tool such as Matplotlib. One possible implementation of that was already implemented in a plotting utility though, so no need to sweat (and we will smooth the data to try and pick up trends more easily):

>>> from kromatography.plotting.api import plot_sim_group_performances
>>> plot_sim_group_performances(grid, param1, param2, smooth='gauss')

This last command should trigger 6 interactive plots to come up, one for each performances contained in the grid data. The plots can be zoomed in, explored, saved to a file, ... Other plotting tools are readily available even to Python non-experts. For example, one can build a box-plot to show the distribution of values for all performances by running:

>>> grid.group_data.boxplot()

(Note that for the plot to be displayed, the Python console must be initialized by running %matplotlib qt. This only needs to be done once.) That reflects how powerful and full featured this pandas.DataFrame is and the interested reader is encouraged to go read about the popular Pandas library.

Want more examples?

If these API explorations are useful, and other examples would be useful, please Contact us and let us know what additional examples would be useful.