emodel_generalisation.mcmc¶
Module to sample electrical model parameter space with MCMC.
Functions
|
Bin data to make heatmap. |
|
Return the distance from the ith point in a Euclidean point cloud to the rest of the points |
|
Filter redundant features to reduce number of IT computations. |
|
Filter emodel by distance to a given emodel, for local minima filtering. |
|
Get 2d correlations. |
|
Compute a furthest point sampling permutation of a set of points |
|
Get experimenatl mean and sd. |
|
Load chains from main run_df file where the first row contains initial condition. |
|
Plot MI matrix. |
|
Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot. |
|
Plot only highest correlated tuples. |
|
Make a corner plot which consists of scatter plots of all pairs. |
|
Plot histogram of costs. |
|
Plot the value of the cost of each chain as a function of iteration. |
|
Plot feature correlations. |
|
Plot feature distributions. |
|
Plot the value of the cost of each chain as a function of iteration. |
|
Plot parameter distributions below and above a split. |
|
Plot feature distributions of some features with violin plots. |
|
Plot score below and above a split. |
|
Plot histogram of emodel distances, and clustered distance matrix with selected emodels. |
|
Plot the step size of the first chain in normalized parameter space. |
|
Plot subcorner of top correlations. |
|
Main function to call to run several chains in parallel. |
|
Create a final.json file with selected emodels. |
|
Select emodels far away from each others in parameter space. |
Classes
|
Class to setup and run a markov chain on emodel parameter space. |
- class emodel_generalisation.mcmc.MarkovChain(n_steps=100, result_df_path='result.csv', temperature=1.0, proposal_params=None, emodel=None, access_point=None, stochasticity=False, mcmc_type='metropolis_hastings', weights=None, seed=42, frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', cost_type='max', resume=False)¶
Bases:
objectClass to setup and run a markov chain on emodel parameter space.
- get_random_parameters()¶
Get random parameter to initialise a chain.
- run(depth=0)¶
Run the MCMC.
- emodel_generalisation.mcmc.bin_data(p1, p2, f, n=20, mode='mean', _min1=-1.0, _max1=1.0, _min2=-1.0, _max2=1.0)¶
Bin data to make heatmap.
- emodel_generalisation.mcmc.dpoint2pointcloud(X, i, metric)¶
Return the distance from the ith point in a Euclidean point cloud to the rest of the points
Copied from ripser.py
- Parameters:
X (ndarray (n_samples, n_features)) – A numpy array of data
i (int) – The index of the point from which to return all distances
metric (string or callable) – The metric to use when calculating distance between instances in a feature array
- emodel_generalisation.mcmc.filter_features(df)¶
Filter redundant features to reduce number of IT computations.
- emodel_generalisation.mcmc.filter_local_min(df, dist_split=3, emodel_index=0)¶
Filter emodel by distance to a given emodel, for local minima filtering.
- emodel_generalisation.mcmc.get_2d_correlations(df, x_col='normalized_parameters', y_col='normalized_parameters', mi_max=1, feature=None, tpe='MI')¶
Get 2d correlations.
- emodel_generalisation.mcmc.get_greedy_perm(X, n_perm=None, dist_matrix=False, metric='euclidean')¶
Compute a furthest point sampling permutation of a set of points
Copied from ripser.py
- Parameters:
X (ndarray (n_samples, n_features)) – A numpy array of either data or distance matrix
dist_matrix (bool) – Indicator that X is a distance matrix, if not we compute distances in X using the chosen metric.
n_perm (int) – Number of points to take in the permutation
metric (string or callable) – The metric to use when calculating distance between instances in a feature array
- Returns:
idx_perm (ndarray(n_perm)) – Indices of points in the greedy permutation
lambdas (ndarray(n_perm)) – Covering radii at different points
dperm2all (ndarray(n_perm, n_samples)) – Distances from points in the greedy permutation to points in the original point set
- emodel_generalisation.mcmc.get_mean_sd(efeatures, feat)¶
Get experimenatl mean and sd.
- emodel_generalisation.mcmc.load_chains(run_df, base_path='.', with_single_origin=False, n_chains=None)¶
Load chains from main run_df file where the first row contains initial condition.
If run_df is a path to .csv, we will load it.
- emodel_generalisation.mcmc.plot_MI(MI, with_cluster=False)¶
Plot MI matrix.
- emodel_generalisation.mcmc.plot_autocorrelation(df, n_emodels=10, filename='autocorrelation.pdf')¶
Autocorrelation plot, adapted from pandas.plotting.autocorrelation_plot.
- emodel_generalisation.mcmc.plot_best_corr(df, cor, x_col, y_col, filename, sd=5)¶
Plot only highest correlated tuples.
- emodel_generalisation.mcmc.plot_corner(df, feature=None, filename='corner.pdf', n_bins=20, cmap='gnuplot', normalize=False, highlights=None, sort_params=True, with_pearson=False)¶
Make a corner plot which consists of scatter plots of all pairs.
- emodel_generalisation.mcmc.plot_cost(df, split, filename='figures/costs.pdf')¶
Plot histogram of costs.
- emodel_generalisation.mcmc.plot_cost_convergence(df, filename='cost_convergence.png')¶
Plot the value of the cost of each chain as a function of iteration.
- emodel_generalisation.mcmc.plot_feature_correlations(df, split, pearson_thresh=0.6, figure_name='feature_corrs.pdf')¶
Plot feature correlations.
- emodel_generalisation.mcmc.plot_feature_distributions(df, emodel, access_point, filename='feature_distributions.pdf', log_scale=True)¶
Plot feature distributions.
- emodel_generalisation.mcmc.plot_full_cost_convergence(df_burnin, df, clip=10, filename='cost_convergence.pdf')¶
Plot the value of the cost of each chain as a function of iteration.
- emodel_generalisation.mcmc.plot_parameter_distributions(df, split, filename='figures/parameter_distributions.pdf')¶
Plot parameter distributions below and above a split.
- emodel_generalisation.mcmc.plot_reduced_feature_distributions(df, emodel, access_point, features=None, filename='reduced_feature_distributions.pdf')¶
Plot feature distributions of some features with violin plots.
- emodel_generalisation.mcmc.plot_score_distributions(df, split, filename='figures/score_distributions.pdf')¶
Plot score below and above a split.
- emodel_generalisation.mcmc.plot_selected_emodels(df, emodel_ids, dists, threshold=4)¶
Plot histogram of emodel distances, and clustered distance matrix with selected emodels.
- emodel_generalisation.mcmc.plot_step_size(run_df, filename='mcmc_stepsize.pdf')¶
Plot the step size of the first chain in normalized parameter space.
- emodel_generalisation.mcmc.plot_top_corner(df, out_path)¶
Plot subcorner of top correlations.
- emodel_generalisation.mcmc.run_several_chains(n_chains=50, n_steps=100, results_df_path='chains', run_df_path='run_df.csv', temperature=1.0, proposal_params=None, access_point=None, emodel_dir=None, recipes_path=None, final_path=None, legacy_dir_structure=True, emodel=None, with_seeds=False, stochasticity=False, mcmc_type='metropolis_hastings', parallel_lib='multiprocessing', frozen_params=None, random_initial_parameters=True, mcmc_log_file='mcmc_log.txt', weights=None, chain_df=None)¶
Main function to call to run several chains in parallel.
- emodel_generalisation.mcmc.save_selected_emodels(df, emodel_ids, emodel='cADpyr_L5TPC', final_path='selected_final.json')¶
Create a final.json file with selected emodels.
- emodel_generalisation.mcmc.select_emodels(df, threshold=4.0, method='local_min')¶
Select emodels far away from each others in parameter space.
if method == local_min: we look for local minimum with radius = threshold if method == distance: we naively search for points at minimum threshold radius