emodel_generalisation.mcmc¶

Module to sample electrical model parameter space with MCMC.

Functions

`bin_data`(p1, p2, f[, n, mode, _min1, _max1, ...])	Bin data to make heatmap.
`dpoint2pointcloud`(X, i, metric)	Return the distance from the ith point in a Euclidean point cloud to the rest of the points
`filter_features`(df)	Filter redundant features to reduce number of IT computations.
`filter_local_min`(df[, dist_split, emodel_index])	Filter emodel by distance to a given emodel, for local minima filtering.
`get_2d_correlations`(df[, x_col, y_col, ...])	Get 2d correlations.
`get_greedy_perm`(X[, n_perm, dist_matrix, metric])	Compute a furthest point sampling permutation of a set of points
`get_mean_sd`(efeatures, feat)	Get experimenatl mean and sd.
`load_chains`(run_df[, base_path, ...])	Load chains from main run_df file where the first row contains initial condition.
`plot_MI`(MI[, with_cluster])	Plot MI matrix.
`plot_autocorrelation`(df[, n_emodels, filename])	Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot.
`plot_best_corr`(df, cor, x_col, y_col, filename)	Plot only highest correlated tuples.
`plot_corner`(df[, feature, filename, n_bins, ...])	Make a corner plot which consists of scatter plots of all pairs.
`plot_cost`(df, split[, filename])	Plot histogram of costs.
`plot_cost_convergence`(df[, filename])	Plot the value of the cost of each chain as a function of iteration.
`plot_feature_correlations`(df, split[, ...])	Plot feature correlations.
`plot_feature_distributions`(df, emodel, ...)	Plot feature distributions.
`plot_full_cost_convergence`(df_burnin, df[, ...])	Plot the value of the cost of each chain as a function of iteration.
`plot_parameter_distributions`(df, split[, ...])	Plot parameter distributions below and above a split.
`plot_reduced_feature_distributions`(df, ...)	Plot feature distributions of some features with violin plots.
`plot_score_distributions`(df, split[, filename])	Plot score below and above a split.
`plot_selected_emodels`(df, emodel_ids, dists)	Plot histogram of emodel distances, and clustered distance matrix with selected emodels.
`plot_step_size`(run_df[, filename])	Plot the step size of the first chain in normalized parameter space.
`plot_top_corner`(df, out_path)	Plot subcorner of top correlations.
`run_several_chains`([n_chains, n_steps, ...])	Main function to call to run several chains in parallel.
`save_selected_emodels`(df, emodel_ids[, ...])	Create a final.json file with selected emodels.
`select_emodels`(df[, threshold, method])	Select emodels far away from each others in parameter space.

Classes

MarkovChain([n_steps, result_df_path, ...])

Class to setup and run a markov chain on emodel parameter space.

class emodel_generalisation.mcmc.MarkovChain(n_steps=100, result_df_path='result.csv', temperature=1.0, proposal_params=None, emodel=None, access_point=None, stochasticity=False, mcmc_type='metropolis_hastings', weights=None, seed=42, frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', cost_type='max', resume=False)¶

Bases: object

Class to setup and run a markov chain on emodel parameter space.

get_random_parameters()¶: Get random parameter to initialise a chain.

run(depth=0)¶: Run the MCMC.

emodel_generalisation.mcmc.bin_data(p1, p2, f, n=20, mode='mean', _min1=-1.0, _max1=1.0, _min2=-1.0, _max2=1.0)¶: Bin data to make heatmap.

emodel_generalisation.mcmc.dpoint2pointcloud(X, i, metric)¶

Return the distance from the ith point in a Euclidean point cloud to the rest of the points

Copied from ripser.py

Parameters:

X (ndarray (n_samples, n_features)) – A numpy array of data
i (int) – The index of the point from which to return all distances
metric (string or callable) – The metric to use when calculating distance between instances in a feature array

emodel_generalisation.mcmc.filter_features(df)¶: Filter redundant features to reduce number of IT computations.

emodel_generalisation.mcmc.filter_local_min(df, dist_split=3, emodel_index=0)¶: Filter emodel by distance to a given emodel, for local minima filtering.

emodel_generalisation.mcmc.get_2d_correlations(df, x_col='normalized_parameters', y_col='normalized_parameters', mi_max=1, feature=None, tpe='MI')¶: Get 2d correlations.

emodel_generalisation.mcmc.get_greedy_perm(X, n_perm=None, dist_matrix=False, metric='euclidean')¶

Compute a furthest point sampling permutation of a set of points

Copied from ripser.py

Parameters:

X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix
dist_matrix (bool) – Indicator that X is a distance matrix, if not we compute distances in X using the chosen metric.
n_perm (int) – Number of points to take in the permutation
metric (string or callable) – The metric to use when calculating distance between instances in a feature array

Returns:

idx_perm (ndarray(n_perm)) – Indices of points in the greedy permutation
lambdas (ndarray(n_perm)) – Covering radii at different points
dperm2all (ndarray(n_perm, n_samples)) – Distances from points in the greedy permutation to points in the original point set

emodel_generalisation.mcmc.get_mean_sd(efeatures, feat)¶: Get experimenatl mean and sd.

emodel_generalisation.mcmc.load_chains(run_df, base_path='.', with_single_origin=False, n_chains=None)¶

Load chains from main run_df file where the first row contains initial condition.

If run_df is a path to .csv, we will load it.

emodel_generalisation.mcmc.plot_MI(MI, with_cluster=False)¶: Plot MI matrix.

emodel_generalisation.mcmc.plot_autocorrelation(df, n_emodels=10, filename='autocorrelation.pdf')¶: Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot.

emodel_generalisation.mcmc.plot_best_corr(df, cor, x_col, y_col, filename, sd=5)¶: Plot only highest correlated tuples.

emodel_generalisation.mcmc.plot_corner(df, feature=None, filename='corner.pdf', n_bins=20, cmap='gnuplot', normalize=False, highlights=None, sort_params=True, with_pearson=False)¶

Make a corner plot which consists of scatter plots of all pairs.

Parameters:

feature (str) – name of feature for coloring heatmap
filename (str) – name of figure for corner plot

emodel_generalisation.mcmc.plot_cost(df, split, filename='figures/costs.pdf')¶: Plot histogram of costs.

emodel_generalisation.mcmc.plot_cost_convergence(df, filename='cost_convergence.png')¶: Plot the value of the cost of each chain as a function of iteration.

emodel_generalisation.mcmc.plot_feature_correlations(df, split, pearson_thresh=0.6, figure_name='feature_corrs.pdf')¶: Plot feature correlations.

emodel_generalisation.mcmc.plot_feature_distributions(df, emodel, access_point, filename='feature_distributions.pdf', log_scale=True)¶: Plot feature distributions.

emodel_generalisation.mcmc.plot_full_cost_convergence(df_burnin, df, clip=10, filename='cost_convergence.pdf')¶: Plot the value of the cost of each chain as a function of iteration.

emodel_generalisation.mcmc.plot_parameter_distributions(df, split, filename='figures/parameter_distributions.pdf')¶: Plot parameter distributions below and above a split.

emodel_generalisation.mcmc.plot_reduced_feature_distributions(df, emodel, access_point, features=None, filename='reduced_feature_distributions.pdf')¶: Plot feature distributions of some features with violin plots.

emodel_generalisation.mcmc.plot_score_distributions(df, split, filename='figures/score_distributions.pdf')¶: Plot score below and above a split.

emodel_generalisation.mcmc.plot_selected_emodels(df, emodel_ids, dists, threshold=4)¶: Plot histogram of emodel distances, and clustered distance matrix with selected emodels.

emodel_generalisation.mcmc.plot_step_size(run_df, filename='mcmc_stepsize.pdf')¶: Plot the step size of the first chain in normalized parameter space.

emodel_generalisation.mcmc.plot_top_corner(df, out_path)¶: Plot subcorner of top correlations.

emodel_generalisation.mcmc.run_several_chains(n_chains=50, n_steps=100, results_df_path='chains', run_df_path='run_df.csv', temperature=1.0, proposal_params=None, access_point=None, emodel_dir=None, recipes_path=None, final_path=None, legacy_dir_structure=True, emodel=None, with_seeds=False, stochasticity=False, mcmc_type='metropolis_hastings', parallel_lib='multiprocessing', frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', weights=None, chain_df=None)¶: Main function to call to run several chains in parallel.

emodel_generalisation.mcmc.save_selected_emodels(df, emodel_ids, emodel='cADpyr_L5TPC', final_path='selected_final.json')¶: Create a final.json file with selected emodels.

emodel_generalisation.mcmc.select_emodels(df, threshold=4.0, method='local_min')¶

Select emodels far away from each others in parameter space.

if method == local_min: we look for local minimum with radius = threshold if method == distance: we naively search for points at minimum threshold radius