# kromatography.compute package¶

## kromatography.compute.brute_force_binding_model_optimizer module¶

Brute force implementation of a binding model optimizer.

class kromatography.compute.brute_force_binding_model_optimizer.BruteForce2StepBindingModelOptimizer(**traits)[source]

2 step brute force (grid search based) optimizer to optimize binding model parameters.

Optimization strategy contains 2 steps: 1. Assume the same binding parameters for all simulations, and explore a wide grid of values, to try and find decent values for each component. 2. Then scan a grid of values around these starting points for each component. Since there are interactions between peaks, the result depends on the order with which components are scanned. That leads to 2 optimizer steps for a 1-component product and N+1 optimizer steps for an N-componment product.

do_refine = Bool(True)

Switch to allow/disallow the refinement steps after the constant scan

refining_step_spacing = Enum('Best', 'Linear', 'Log')

Spacing for step 1 and up (which refine component parameters)

refining_step_num_values = Int(DEFAULT_REFINING_GRID_NUM_VALUES)

Size of scanning grid for refining step

refining_factor = Range(value=DEFAULT_REFINING_GRID_SIZE_FACTOR, low=1, high=100)

Control length of refined grid in % of previous grid’s spacing size

optimal_models = List(Instance(BindingModel))

Best binding model globally

type = Constant(BRUTE_FORCE_2STEP_OPTIMIZER_TYPE)

Type of optimizer

create_steps(**constant_step_traits)[source]

Create all optimization steps: constant scan and refining steps.

update_step(step_idx=1, from_step=0)[source]

Initialize a step’s simulation group(s) from an already run step.

update_starting_point_simulations(step_idx=1, from_step=0)[source]

Build the center point simulations from the optimal model for each component at the previous step.

build_parameter_list(step_idx=1, from_step=0)[source]

Build the list of scanned parameters from the list of scanned parameters at the previous step. They should scan the binding model parameters for the ith product component.

setup_run_refining_steps()[source]

Update & run one of the refining steps once constant step has run.

update_optimal_simulation_map(obj, attr_name, old, new)[source]

Collect best num_optimal_simulations simulations with lowest costs.

## kromatography.compute.brute_force_binding_model_optimizer_step module¶

Driver class to build the optimal binding model given a (set of) target experiment(s) and a transport model.

class kromatography.compute.brute_force_binding_model_optimizer_step.BruteForceBindingModelOptimizerStep(**traits)[source]

Driver to build the optimal SMA binding model given an experiment and a transport model using the brute force approach.

optimal_model_for_comp = Property(Dict, depends_on='optimal_simulation_for_comp[]')

Dict mapping each component to best binding models to minimize cost

optimizer_step_type = Constant(BRUTE_FORCE_OPTIMIZER_STEP_TYPE)

Type of the optimizer step

## kromatography.compute.brute_force_optimizer module¶

Class for brute force optimizers.

class kromatography.compute.brute_force_optimizer.BruteForceOptimizer(**traits)[source]

An optimizer class stringing simulation grid based steps to find 1 or more optimal parameters to fit an experiment or a set of experiments.

type = Constant(GRID_BASED_OPTIMIZER_TYPE)

Optimizer type

steps = List(Instance(BruteForceOptimizerStep))

Succession of steps to complete the optimization process

cadet_request = Event

Event to request the solver to execute the optimizer, listened to by app

size_run = Property(Int, depends_on='steps.size_run')

percent_run = Property(Str, depends_on='size_run')

Percentage of the optimizer that has already run

create_steps(param_list)[source]

Create all optimization steps: constant scan and refining steps.

run(job_manager, wait=False)[source]

Run of the optimizer: run first step and set run attributes.

wait()[source]

Wait until all currently known optimization steps have finished running.

run_step(step_idx, wait=False)[source]

Run optimization steps with index step_idx.s

recompute_costs_for_weights(new_weights)[source]

Propagate a change in desired cost function weights to steps.

## kromatography.compute.brute_force_optimizer_step module¶

Driver class and supporting utilities to build optimal simulations given a (set of) experiments, using the brute force approach of minimizing 1 or more simulation grids.

class kromatography.compute.brute_force_optimizer_step.BruteForceOptimizerStep(**traits)[source]

Optimize a set of simulation parameters to model the provided experiment using the grid search (brute force) approach.

If sim_group_max_size is 0, the step creates 1 simulation grid around a simulation built to model each target experiment. if sim_group_max_size is a positive integer, all simulations for a target experiments are split into groups of size less or equal to sim_group_max_size.

When a simulation grid is fully run, the cost of each simulation to the corresponding target experiment is computed using the cost function attribute. The cost data from each simulation grid is stored in the group_cost_data dict and combined into the step’s cost_data once the simulation names are stripped.

optimizer_step_type = Constant(OPTIMIZER_STEP_TYPE)

Type of the optimizer step

parameter_list = List(ParameterScanDescription)

List of parameter objects to scan

scanned_param_names = Property(List(Str), depends_on='parameter_list[]')

List of parameter names to scan

simulation_groups = List(Instance(SimulationGroup))

List of simulation groups, scanning desired parameters, 1 per target exp

group_cost_functions = Dict(Str, Callable)

Cost function to minimize, one per simulation group

sim_group_max_size = Int

Maximum size for each of the simulation groups in the step

size_run = Property(Int, depends_on='simulation_groups.size_run')

percent_run = Property(Str, depends_on='size_run')

Percentage of the optimizer that has already run

cost_agg_func = Enum('sum', 'mean')

Aggregation method to combine costs for all components & all experiments

optimal_simulation_for_comp = Dict

Dict mapping each component to a list of the best simulations

run(job_manager, wait=False)[source]

Run optimization step by running all simulation groups it contains.

wait()[source]

Wait for currently known simulation groups to finish running.

initialize_sim_group()[source]

Initialize simulation groups with one based on self attribute.

Depending on the group_max_size, there may be multiple simulation groups to target a given experiment.

recompute_costs_for_weights(new_weights)[source]

Assume new weights for all cost functions.

Also recompute costs for all groups if they have already been computed.

compute_costs(sim_group, cost_function=None)[source]

Compute the costs of one of the SimulationGroups of the step.

Also cache the cost_function for each sim_group, so that costs can be recomputed if weights are changed.

Parameters: sim_group (SimulationGroup) – Group for which to compute costs. cost_function (Callable [OPTIONAL]) – Target cost function to use to compute costs. Optional: if a cost_function_type has been provided at step creation, and this is None, a cost_function will be created.
update_cost_data_dict(group, cost_data, skip_aggregate=False)[source]

Collect all cost_function cost data for all sim groups.

Also aggregates all into the step’s cost_data if the step has finished running. The step’s cost data will aggregate data from all simulation groups, sum/average it over all components, and display the scanned parameters values along side with the aggregate cost.

invalidate_group_cost_data()[source]

Past cost_data are invalid. Delete them.

aggregate_cost_data()[source]

Aggregate cost data over all target experiment.

The step’s cost data will aggregate data from all simulation groups, sum/average it over all components, and display the scanned parameters values along side with the aggregate cost.

append_param_values(costs_df, simulations)[source]

Evaluate parameters for provided sims and reset as cost DF index.

update_optimal_simulation_for_comp()[source]

Extract the best simulation for each product component.

get_optimal_sims(exp_name, num_sims)[source]

Collect optimal num_sims simulations matching specific experiment.

optimize_costs(sim_group, attr_name, group_has_run)[source]
kromatography.compute.brute_force_optimizer_step.is_repeating_array(val)[source]
kromatography.compute.brute_force_optimizer_step.is_squeezable(val)[source]

## kromatography.compute.constant_binding_model_optimizer_step module¶

Example of a brute force BindingModelOptimizerStep that scans ka, nu and optionally sigma, applying the same binding model parameters to all product components.

class kromatography.compute.constant_binding_model_optimizer_step.ConstantBruteForceBindingModelOptimizerStep(**traits)[source]

Optimizer step that scans in a brute force fashion the binding model parameter space, applying the same values to all product components.

scan_type = Enum(['Log', 'Linear'])

Scan type, applied to all parameters.

scan_num_values = Int(5)

Number of values to select along each scanning dimensions

ka_low_high = Tuple(DEFAULT_KA_LOW_HIGH)

Range of scanning for SMA Ka

nu_low_high = Tuple(DEFAULT_NU_LOW_HIGH)

Range of scanning for SMA Nu

sigma_low_high = Tuple(DEFAULT_SIGMA_LOW_HIGH)

Range of scanning for SMA sigma

scan_ka = Bool(True)

Scan the SMA Ka parameter?

scan_nu = Bool(True)

Scan the SMA Nu parameter?

scan_sigma = Bool

Scan the SMA sigma parameter?

component_names = Property(List, depends_on='target_experiments')

List of all comps in target product. Not to confuse with target_components which is the subset used for cost computation

## kromatography.compute.cost_function_calcs module¶

Supporting calculation functions for binding model parameter optimization.

kromatography.compute.cost_function_calcs.calc_trailing_slope(x_data, y_data, low_trigger_fraction=0.2, high_trigger_fraction=0.8)[source]

Returns slope on the back side of the peak between two points.

First point is the first value below the high trigger (starting search from the peak). Second point is the first value below the low trigger (starting from first point): (y2-y1)/(x2-x1) .

Uses linear interpolation to estimate where we’re at the actual trigger value to reduce sensitivity to low resolution of fraction data. Includes some protection for non-ideal data that drops straight to a zero slope (making interpolation hard).

Parameters: y_data (array) – Data we’re computing the trailing slope for. x_data (array) – Time values along which y_data is provided. low_trigger_fraction (float) – Fraction of the data max above which to compute the trailing slope. high_trigger_fraction (float) – Fraction of the data max below which to compute the trailing slope.
kromatography.compute.cost_function_calcs.calc_peak_center_of_mass(x_data, y_data)[source]

Calculates the location of the center of mass for peak: integral(x*y*dx)/integral(y*dx).

kromatography.compute.cost_function_calcs.calc_peak_timing(x_data, y_data)[source]

Returns the x value corresponding to y’s maximum. If the peak is flat, the first instance where the maximum is reached will be returned.

kromatography.compute.cost_function_calcs.find_index_of_first_value_below(y_data, start_index, step, trigger)[source]

Given a starting location in an array, returns index of first value less than the trigger.

Parameters: y_data (array) – Data we’re searching. start_index (int) – Starting point within ydata. step (int) – Indicates direction to search (sign), and step size (abs value). trigger (float) – Value we’re testing against

## kromatography.compute.cost_functions module¶

Default cost function for optimizing parameters of a simulation to match an experiment or a set of experiment taken in the SAME conditions.

class kromatography.compute.cost_functions.CostFunction[source]

Bases: traits.has_traits.HasStrictTraits

class kromatography.compute.cost_functions.CostFunction0[source]

Default cost “function” for optimizing a simulation against a (set of) experiment(s).

The cost is defined by a linear combination of the peak location, peak height and peak shape (trailing slope). To use this class, create an instance, optionally specify the target experiments and weights. Then call it like a function, providing the list of simulation to compute the cost for. The target experiments and weights can also be provided at call time.

Note: The list of experiments provided will be used in the following way: all metrics will be collected for all these experiments, but these metrics will be AVERAGED, in effect treating these experiments as 1. To compute the distance between multiple sets of simulations/experiments, this function must be called multiple times.

Example

Assuming that exp is an Experiment and s1 and s2 are Simulation instances:

>>> weights = np.array([1, 2, 3])
>>> func = CostFunction0(target_experiments=[exp],
...                      weights=weights)
>>> func([s1, s2])
Product_1
Simulation Name
Sim 0_Constant Binding Group  123.252989
Sim 0_Constant Binding Group  123.252989

use_uv_for_cost = Bool(False)

Allowed to use UV continuous data to compute cost? Useful for pure protein and no fraction data. Ignored if fraction data is present.

weights = Array(dtype='float64')

Relative importance of peak_time, peak_height and peak_slope respect.

cost_data = Instance(pd.DataFrame)

Output computed costs

peak_time_weight = Float(DEFAULT_PEAK_TIME_WEIGHT)

Weight (relative importance) of the peak time location

peak_height_weight = Float(DEFAULT_PEAK_HEIGHT_WEIGHT)

Weight (relative importance) of the height of the peak

peak_slope_weight = Float(DEFAULT_PEAK_SLOPE_WEIGHT)

Weight (relative importance) of the peak’s back slope

target_experiments = List(Instance(Experiment))

List of experiments to combine to compute the distance of a simulation

target_components = List(Str)

Component whose peak we are trying to fit

cached_exp_data = Array

Metrics/data for the current target experiments

cached_sim_data = Array

Metrics/data for the current target simulations

cached_simulations = List(Simulation)

List of the currently analyzed simulations

collect_experiment_targets()[source]

Collect target metrics for each experiments, and each component.

FIXME: for now, the targets the costs will be computed with is the average of targets for all experiments.

collect_sim_data(simulation_list)[source]

Collect simulation metrics to compute costs from.

collect_metrics(base_experiments, data_collector)[source]

Collect metrics for a list of simulations or experiments.

compute_costs(weights=None)[source]

Compute the costs for each simulation and each component compared to the average target value across all experiments.

Parameters: observed_data (Array) – Array of the metrics for all the components and all the simulations. targets (list(Experiment)) – List of experiments to compare the simulations to. simulation_names (List(str)) – List of simulation names used to populate the index of the returned dataframe. weights (1D-array (optional)) – Array of weights, one for each of the elements of a cost. DataFrame containing the cost for each simulation and each product component.
package_costs(costs)[source]

Package the 2D array of costs into a dataframe with labels.

The index is the list of simulations. the columns are the target components for which the costs are computed.

get_expt_data(results, component)[source]

Extract x and y data for provided experiment and component name.

get_sim_data(results, component)[source]

Extract x and y data for provided experiment and component name.

rebuild_weights()[source]

## kromatography.compute.experiment_optimizer module¶

Base class for binding model optimizers.

class kromatography.compute.experiment_optimizer.ExperimentOptimizer[source]

Bases: app_common.model_tools.data_element.DataElement

A base optimizer class stringing optimization steps to find the optimal simulations to fit an experiment or a set of experiments.

name = Str('New optimizer')

Optimizer name

type = 'ExperimentOptimizer'

Optimizer type

target_experiments = List(Instance(Experiment))

List of experiments to simultaneously minimize the distance to

target_experiment_names = Property(List(Str), depends_on='target_experiments')

List of experiment names to minimize the distance to

target_components = List(Str)

Target product components to compute the cost for

starting_point_simulations = List(Instance(Simulation))

Initial starting points for the optimizer, one for each target exp

cost_function_type = Str('Position/height/Back-Slope')

Type of target cost function to minimize

use_uv_for_cost = Bool(False)

Allowed to use UV continuous data to compute cost? Useful for pure protein and no fraction data.

run_start = Float

Timestamp of starting to run

run_stop = Float

Timestamp of starting to run

steps = List(Instance(ExperimentOptimizerStep))

Succession of steps to complete the optimization process

num_steps = Property(Int, depends_on='steps[]')

Number of steps

size = Property(Int, depends_on='steps[]')

Number of simulations used during optimization

has_run = Bool

Has all simulation groups of all optimizer steps run?

status = Enum([MULTI_SIM_RUNNER_CREATED, MULTI_SIM_RUNNER_RUNNING, MULTI_SIM_RUNNER_FINISHED])

Status of the optimizer as a string

scanned_param_names = Property(List(Str), depends_on='steps[]')

Parameters that are optimized over

num_optimal_simulations = Int(DEFAULT_NUM_OPTIMAL_SIMULATIONS)

Number of models to collect as optimal models

optimal_simulations = List(Instance(Simulation))

Best simulations globally

optimal_simulation_map = Dict

Best simulations, grouped by target experiment

cost_data = Instance(pd.DataFrame)

Collected averaged costs for each combination of the scanned parameters

cost_data_cols = Property(List)

Ordered list of columns for the cost_data DF

run(job_manager, **kwargs)[source]
run_step(**kwargs)[source]
update_optimal_simulation_map(obj, attr_name, old, new)[source]

Collect best num_optimal_simulations simulations with lowest costs.

update_optimal_simulations()[source]

Interleave the best simulations for each target experiment.

rebuild_cost_data()[source]
update_has_run()[source]

## kromatography.compute.experiment_optimizer_step module¶

Driver class and supporting utilities to build the optimal simulations given an experiment and a transport model.

class kromatography.compute.experiment_optimizer_step.ExperimentOptimizerStep(**traits)[source]

Bases: app_common.model_tools.data_element.DataElement

Driver to build an optimal simulation to match (an) experiment(s).

TODO: add the ability to do a smart run, and use initial results from simulation runs to trim the parameter space.

optimizer_step_type = 'Experiment optimizer step'

Type of the optimizer step

target_experiments = List(Instance(Experiment))

Target experiment to simultaneously minimize the distance to

target_components = List(Str)

List of all target component names we are trying to fit simulations to

starting_point_simulations = List(Instance(Simulation))

Initial starting points for the optimizer, one for each target exp

cost_function_type = Str('Position/height/Back-Slope')

Type of cost function to minimize

use_uv_for_cost = Bool(False)

Allowed to use UV continuous data to compute cost? Useful for pure protein and no fraction data.

has_run = Bool

Set when optimizer has run, before cost data is updated.

status = Enum([MULTI_SIM_RUNNER_CREATED, MULTI_SIM_RUNNER_RUNNING, MULTI_SIM_RUNNER_FINISHED])

Status of the optimizer as a string

data_updated = Event

Event emitted once output data has been aggregated once run is finished

cost_data = Instance(pd.DataFrame)

Series mapping all simulations to their costs for each component

cost_func_kw = Dict

Keyword arguments for the cost function creation

run(job_manager, wait=False)[source]

Build and run a SimulationGroup around each center simulation.

assert_all_experiments_valid()[source]
assert_all_exp_have_output()[source]
assert_all_exp_for_same_product()[source]

## kromatography.compute.experiment_performance_parameters module¶

Functions to compute the performance parameters and data for experiments.

kromatography.compute.experiment_performance_parameters.compute_strip_fraction(experiment)[source]

Estimate the fraction of product eluting at/after the strip step.

This fraction is computed from the ratio of the loaded product (with specified load concentration and load volume) to the product eluting between the start of the load and start of the strip. This is computed by integrating the chromatogram before the strip, applying the extinction coefficient from the component with the highest fraction.

This allows to detect potential experimental data discrepancies between:

1. the chromatogram, and in particular, its shape in the Strip part,
2. the product’s extinction coefficient,
3. the mass of product loaded, or in other word, the load concentration.
Parameters: experiment (Experiment) – Experiment object we are computing the strip fraction for. Percentage of the loaded product that is found to elute during the strip step. Value set to nan if unable to compute. UnitScalar
kromatography.compute.experiment_performance_parameters.compute_mass_from_abs_data(absorb_data, ext_coeff, experim, t_start=None, t_stop=None, t_start_idx=None, t_stop_idx=None)[source]

Compute total mass of a product component between start and stop times.

The total mass is computed by integrating the specified chromatogram, between t_start and t_stop and using the specified extinction coefficient and flow rate at each time.

Parameters: absorb_data (XYData) – Data (fraction or continuous) to integrate to compute the contained mass. ext_coeff (UnitScalar) – Extinction coefficient to use to convert the absorbance to a product concentration. experim (Experiment) – Experiment from which to extract the method (and therefore flow rate) information and the system’s path length. t_start (UnitScalar) – Time at which to start integrating, in minutes. Leave as None to use the t_start_idx to specify the time range to integrate. t_stop (UnitScalar) – Time at which to stop integrating, in minutes. Leave as None to use the t_stop_idx to specify the time range to integrate. t_start_idx (Int or None) – Index in the x_data to start integrating at (inclusive). t_stop_idx (Int or None) – Index in the x_data to stop integrating at (exclusive). Leave as None to go all the way to the end. Product mass, in grams, estimated to elute between t_start and t_stop. UnitScalar
kromatography.compute.experiment_performance_parameters.build_flow_rate_array(times, experiment, to_unit='liter/minute')[source]

Build array of flow rates in liter/min at each time of ‘times’ array.

Parameters: times (numpy.array) – Array of chromatogram times at which to extract the flow rates. experiment (Experiment) – Experiment to extract the flow rates from. to_unit (str) – Unit of the output. Array of flow rates at the times of the times array. UnitArray
kromatography.compute.experiment_performance_parameters.get_most_contributing_component(experim, exclude=None)[source]

Returns the component with the largest fraction in the experiment data.

kromatography.compute.experiment_performance_parameters.calculate_experiment_performance_data(exp)[source]

Calculates the performance data based on continuous_data. Performance data includes the start and stop times from the start and stop collection criteria, the pool’s concentrations, volume and purities, and the yield. Used to build the output of a simulation once the solver has run.

Parameters: exp (Experiment) – Experiment object we are computing the performance for. Used to collect information about the product it models and the collection criteria. Returns None if no collection criteria was specified in the simulation’s method. PerformanceData or None
kromatography.compute.experiment_performance_parameters.calculate_exp_component_concentrations(exp, fraction_data, flow_rate, pool_volume, start_collect, stop_collect)[source]

Calculate the concentration of each component in the pool in g/L from experimentally measured fraction data. The mass of a component in the pool is its fraction at various times, times the product concentration integrated over duration of the pooling process. The pool component concentration is the

## kromatography.compute.performance_parameters module¶

kromatography.compute.performance_parameters.calculate_performance_data(sim, continuous_data)[source]

Calculates the performance results based on continuous_data.

Parameters: sim (Simulation) – Simulation object we are computing the performance for. Used to collect information about the product it models and the collection criteria. continuous_data (dict) – Dictionary of continuous data that was simulated by CADET. Returns None if no collection criteria was specified in the simulation’s method. PerformanceData or None
kromatography.compute.performance_parameters.calculate_start_stop_collect(absorb_data, col_criteria, step_start, step_stop)[source]

Find the times and array indices corresponding to start & stop collect.

Parameters: absorb_data (XYData) – Data container for the time and measurement values for absorbance data. col_criteria (CollectionCriteria) – Pool collection criteria describing when to start and stop collecting. step_start (UnitScalar) – Start time of the step creating the pool, in minutes. step_stop (UnitScalar) – Stop time of the step creating the pool, in minutes. Time to start collecting at and corresponding index in the arrays and time to stop collecting at and corresponding index. UnitScalar, int, UnitScalar, int
kromatography.compute.performance_parameters.calculate_pool_volume(start_collect_time, stop_collect_time, flow_rate, column)[source]

Calculates pool volume in CVs.

kromatography.compute.performance_parameters.calculate_pool_concentration(comp_concentrations)[source]

Calculates total pool concentration in g/liter by summing individual component concentrations.

kromatography.compute.performance_parameters.calculate_step_yield(pool_concentration, pool_volume, load_step)[source]

Calculates protein yield, in percent of the mass of protein loaded (at load step).

Parameters: pool_concentration (UnitScalar) – Concentration of the pool extracted. pool_volume (UnitScalar) – Volume of the pool. load_step (MethodStep) – Step describing the load of the protein.
kromatography.compute.performance_parameters.calculate_component_concentrations(product, comp_absorb_data, start_collect_idx, stop_collect_idx)[source]

Calculate the concentration of each component in the pool in g/L.

The volume of a component is defined as the integral of its absorbance data between the start and the stop collect.

## kromatography.compute.strip_fraction_calculator module¶

class kromatography.compute.strip_fraction_calculator.StripFractionCalculator[source]

Bases: traits.has_traits.HasStrictTraits

Class to estimate the strip fraction expected from experimental data.

It is designed with many intermediate

experim = Instance(Experiment)

Experiment from which is drawn data & parameters to estimate strip mass

strip_mass_fraction = Instance(UnitScalar)

Result quantity

loaded_volume = Instance(UnitScalar)

load_concentration = Instance(UnitScalar)

loaded_mass = Instance(UnitScalar)

Mass of product loaded in the column

integrated_mass_before_strip = Instance(UnitScalar)

Mass recovered before the strip

most_contributing_comp = Instance(ProductComponent)

Component from which to guess the average product extinction coefficient

product_ext_coeff = Instance(UnitScalar)

Average product extinction coefficient

reset = Button('Reload experiment')

Event to trigger a reload of the experiment and reset all quantities

load_start = Property(Instance(UnitScalar), depends_on='experim.method_step_boundary_times')

Load step start time: proxy for experim.method_step_boundary_times

strip_start = Property(Instance(UnitScalar), depends_on='experim.method_step_boundary_times')

Strip step start time: : proxy for experim.method_step_boundary_times

load_characteristics_changed()[source]
recompute_integrated_mass()[source]
recompute_mass_fraction()[source]