emodel_generalisation.information¶
Module to compute information theory on mcmc sampling of emodels.
Functions
|
Compute higher order IT. |
|
Compute higher order IT measures. |
|
Create a reduced tuple set. |
|
Select reduced set of tuples using lower percentile of previous order. |
|
Select reduced set of tuples using top previous tuples. |
|
Get the jidt information theory calculator of given type. |
|
Log function. |
|
MI with gaussian approximation. |
Compute Oinfo with guaussian approximation. |
|
|
Scatter plots of pairs with pearson larger than min_corr, and pearson correlation matrix. |
|
Plot tuple distributions. |
|
Reduce number of feature to non-correlated features. |
|
Reduce matrix percentile. |
|
RSI calculation with gaussians (assuming first element is y). |
|
Setup the java env for jidt code. |
- emodel_generalisation.information.compute_higher_order(df, order=3, column_1='features', column_2='normalized_parameters', correlation_type='MI', n_workers=50, batch_size=100, param_tuples=None)¶
Compute higher order IT.
- emodel_generalisation.information.compute_higher_orders(df, min_order=3, max_order=5, column_1='features', column_2='normalized_parameters', correlation_type='MI', n_workers=50, batch_size=100, top=100, min_order_select=3, output_folder='IT_data', with_largests=True)¶
Compute higher order IT measures.
- Parameters:
df (dataframe) – MCMC output dataframe
split (float) – max cost to filter dataframe
max_order (int) – max order to compute
column_1 (str) – name of column one (features for example)
columns_2 (str) – usually param column
correlation_type (str) – MI/Oinfo
n_workers (int) – number of parallel workers to use (to many leads to memory error)
batch_size (int) – number of IT evaluation for each workers
min_order_select (int) – after which order we start to only use top/botom best tuples
output_folder (str) – folder to save .csv for each order
- emodel_generalisation.information.create_reduced_tuple_set(df, data_folder, order, column_1='features', column_2='normalized_parameters', corr_type='Oinfo', top=100, with_largests=True)¶
Create a reduced tuple set.
- emodel_generalisation.information.create_reduced_tuple_set_features(df, data_folder, order, column_1='features', column_2='normalized_parameters', corr_type='Oinfo', top=100, with_largests=True)¶
Select reduced set of tuples using lower percentile of previous order.
- emodel_generalisation.information.create_reduced_tuple_set_parameters(df, data_folder, order, column_1='features', column_2='normalized_parameters', corr_type='Oinfo', top=100, with_largests=True)¶
Select reduced set of tuples using top previous tuples.
- emodel_generalisation.information.get_jidt_calc(tpe='MI', algo_type='gaussian')¶
Get the jidt information theory calculator of given type.
- emodel_generalisation.information.log(x, unit='nats')¶
Log function.
- emodel_generalisation.information.mi_gaussian(x)¶
MI with gaussian approximation.
- emodel_generalisation.information.oinfo_gaussian(x)¶
Compute Oinfo with guaussian approximation.
- emodel_generalisation.information.plot_pair_correlations(df, split=None, min_corr=0.3, column_1='normalized_parameters', column_2=None, filename='parameter_pairs.pdf', clip=0.4, correlation_type='pearson', with_plots=False, plot_top_only_perc=None)¶
Scatter plots of pairs with pearson larger than min_corr, and pearson correlation matrix.
If column_2 is provided, the correlation will be non-square and no clustering will be applied. :param min_corr: minimum correlation for plotting scatter plot :type min_corr: float :param clip: value to clip correlation matrix :type clip: float
- emodel_generalisation.information.plot_tuple_distributions(data_folder='data', figure_name='IT_corr.pdf', correlation_type='Oinfo', min_order=3, max_order=20, column_1='features', with_min=True, with_max=True, n_top_tuples=100, tuple_freq_thresh=0.01)¶
Plot tuple distributions.
- emodel_generalisation.information.reduce_features(df, threshold=0.9)¶
Reduce number of feature to non-correlated features.
- emodel_generalisation.information.reduce_matrix_percentile(df, percentile, data=None)¶
Reduce matrix percentile.
- emodel_generalisation.information.rsi_gaussian(x)¶
RSI calculation with gaussians (assuming first element is y).
- emodel_generalisation.information.setup_jidt(jarlocation='/gpfs/bbp.cscs.ch/home/arnaudon/code/jidt/infodynamics.jar')¶
Setup the java env for jidt code.