emodel_generalisation.mcmc

Module to sample electrical model parameter space with MCMC.

Functions

bin_data(p1, p2, f[, n, mode, _min1, _max1, ...])

Bin data to make heatmap.

dpoint2pointcloud(X, i, metric)

Return the distance from the ith point in a Euclidean point cloud to the rest of the points

filter_features(df)

Filter redundant features to reduce number of IT computations.

filter_local_min(df[, dist_split, emodel_index])

Filter emodel by distance to a given emodel, for local minima filtering.

get_2d_correlations(df[, x_col, y_col, ...])

Get 2d correlations.

get_greedy_perm(X[, n_perm, dist_matrix, metric])

Compute a furthest point sampling permutation of a set of points

get_mean_sd(efeatures, feat)

Get experimenatl mean and sd.

load_chains(run_df[, base_path, ...])

Load chains from main run_df file where the first row contains initial condition.

plot_MI(MI[, with_cluster])

Plot MI matrix.

plot_autocorrelation(df[, n_emodels, filename])

Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot.

plot_best_corr(df, cor, x_col, y_col, filename)

Plot only highest correlated tuples.

plot_corner(df[, feature, filename, n_bins, ...])

Make a corner plot which consists of scatter plots of all pairs.

plot_cost(df, split[, filename])

Plot histogram of costs.

plot_cost_convergence(df[, filename])

Plot the value of the cost of each chain as a function of iteration.

plot_feature_correlations(df, split[, ...])

Plot feature correlations.

plot_feature_distributions(df, emodel, ...)

Plot feature distributions.

plot_full_cost_convergence(df_burnin, df[, ...])

Plot the value of the cost of each chain as a function of iteration.

plot_parameter_distributions(df, split[, ...])

Plot parameter distributions below and above a split.

plot_reduced_feature_distributions(df, ...)

Plot feature distributions of some features with violin plots.

plot_score_distributions(df, split[, filename])

Plot score below and above a split.

plot_selected_emodels(df, emodel_ids, dists)

Plot histogram of emodel distances, and clustered distance matrix with selected emodels.

plot_step_size(run_df[, filename])

Plot the step size of the first chain in normalized parameter space.

plot_top_corner(df, out_path)

Plot subcorner of top correlations.

run_several_chains([n_chains, n_steps, ...])

Main function to call to run several chains in parallel.

save_selected_emodels(df, emodel_ids[, ...])

Create a final.json file with selected emodels.

select_emodels(df[, threshold, method])

Select emodels far away from each others in parameter space.

Classes

MarkovChain([n_steps, result_df_path, ...])

Class to setup and run a markov chain on emodel parameter space.

class emodel_generalisation.mcmc.MarkovChain(n_steps=100, result_df_path='result.csv', temperature=1.0, proposal_params=None, emodel=None, access_point=None, stochasticity=False, mcmc_type='metropolis_hastings', weights=None, seed=42, frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', cost_type='max', resume=False)

Bases: object

Class to setup and run a markov chain on emodel parameter space.

get_random_parameters()

Get random parameter to initialise a chain.

run(depth=0)

Run the MCMC.

emodel_generalisation.mcmc.bin_data(p1, p2, f, n=20, mode='mean', _min1=-1.0, _max1=1.0, _min2=-1.0, _max2=1.0)

Bin data to make heatmap.

emodel_generalisation.mcmc.dpoint2pointcloud(X, i, metric)

Return the distance from the ith point in a Euclidean point cloud to the rest of the points

Copied from ripser.py

Parameters:
  • X (ndarray (n_samples, n_features)) – A numpy array of data

  • i (int) – The index of the point from which to return all distances

  • metric (string or callable) – The metric to use when calculating distance between instances in a feature array

emodel_generalisation.mcmc.filter_features(df)

Filter redundant features to reduce number of IT computations.

emodel_generalisation.mcmc.filter_local_min(df, dist_split=3, emodel_index=0)

Filter emodel by distance to a given emodel, for local minima filtering.

emodel_generalisation.mcmc.get_2d_correlations(df, x_col='normalized_parameters', y_col='normalized_parameters', mi_max=1, feature=None, tpe='MI')

Get 2d correlations.

emodel_generalisation.mcmc.get_greedy_perm(X, n_perm=None, dist_matrix=False, metric='euclidean')

Compute a furthest point sampling permutation of a set of points

Copied from ripser.py

Parameters:
  • X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix

  • dist_matrix (bool) – Indicator that X is a distance matrix, if not we compute distances in X using the chosen metric.

  • n_perm (int) – Number of points to take in the permutation

  • metric (string or callable) – The metric to use when calculating distance between instances in a feature array

Returns:

  • idx_perm (ndarray(n_perm)) – Indices of points in the greedy permutation

  • lambdas (ndarray(n_perm)) – Covering radii at different points

  • dperm2all (ndarray(n_perm, n_samples)) – Distances from points in the greedy permutation to points in the original point set

emodel_generalisation.mcmc.get_mean_sd(efeatures, feat)

Get experimenatl mean and sd.

emodel_generalisation.mcmc.load_chains(run_df, base_path='.', with_single_origin=False, n_chains=None)

Load chains from main run_df file where the first row contains initial condition.

If run_df is a path to .csv, we will load it.

emodel_generalisation.mcmc.plot_MI(MI, with_cluster=False)

Plot MI matrix.

emodel_generalisation.mcmc.plot_autocorrelation(df, n_emodels=10, filename='autocorrelation.pdf')

Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot.

emodel_generalisation.mcmc.plot_best_corr(df, cor, x_col, y_col, filename, sd=5)

Plot only highest correlated tuples.

emodel_generalisation.mcmc.plot_corner(df, feature=None, filename='corner.pdf', n_bins=20, cmap='gnuplot', normalize=False, highlights=None, sort_params=True, with_pearson=False)

Make a corner plot which consists of scatter plots of all pairs.

Parameters:
  • feature (str) – name of feature for coloring heatmap

  • filename (str) – name of figure for corner plot

emodel_generalisation.mcmc.plot_cost(df, split, filename='figures/costs.pdf')

Plot histogram of costs.

emodel_generalisation.mcmc.plot_cost_convergence(df, filename='cost_convergence.png')

Plot the value of the cost of each chain as a function of iteration.

emodel_generalisation.mcmc.plot_feature_correlations(df, split, pearson_thresh=0.6, figure_name='feature_corrs.pdf')

Plot feature correlations.

emodel_generalisation.mcmc.plot_feature_distributions(df, emodel, access_point, filename='feature_distributions.pdf', log_scale=True)

Plot feature distributions.

emodel_generalisation.mcmc.plot_full_cost_convergence(df_burnin, df, clip=10, filename='cost_convergence.pdf')

Plot the value of the cost of each chain as a function of iteration.

emodel_generalisation.mcmc.plot_parameter_distributions(df, split, filename='figures/parameter_distributions.pdf')

Plot parameter distributions below and above a split.

emodel_generalisation.mcmc.plot_reduced_feature_distributions(df, emodel, access_point, features=None, filename='reduced_feature_distributions.pdf')

Plot feature distributions of some features with violin plots.

emodel_generalisation.mcmc.plot_score_distributions(df, split, filename='figures/score_distributions.pdf')

Plot score below and above a split.

emodel_generalisation.mcmc.plot_selected_emodels(df, emodel_ids, dists, threshold=4)

Plot histogram of emodel distances, and clustered distance matrix with selected emodels.

emodel_generalisation.mcmc.plot_step_size(run_df, filename='mcmc_stepsize.pdf')

Plot the step size of the first chain in normalized parameter space.

emodel_generalisation.mcmc.plot_top_corner(df, out_path)

Plot subcorner of top correlations.

emodel_generalisation.mcmc.run_several_chains(n_chains=50, n_steps=100, results_df_path='chains', run_df_path='run_df.csv', temperature=1.0, proposal_params=None, access_point=None, emodel_dir=None, recipes_path=None, final_path=None, legacy_dir_structure=True, emodel=None, with_seeds=False, stochasticity=False, mcmc_type='metropolis_hastings', parallel_lib='multiprocessing', frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', weights=None, chain_df=None)

Main function to call to run several chains in parallel.

emodel_generalisation.mcmc.save_selected_emodels(df, emodel_ids, emodel='cADpyr_L5TPC', final_path='selected_final.json')

Create a final.json file with selected emodels.

emodel_generalisation.mcmc.select_emodels(df, threshold=4.0, method='local_min')

Select emodels far away from each others in parameter space.

if method == local_min: we look for local minimum with radius = threshold if method == distance: we naively search for points at minimum threshold radius