API Reference
faultmap package
faultmap.config_setup
Setup functions used to read configuration files.
- class faultmap.config_setup.Locations(data_loc, config_loc, save_loc, infodynamics_loc)[source]
Directories used for data, configuration, results, and JIDT.
- class faultmap.config_setup.CaseSetup(save_loc, case_config_dir, case_dir, infodynamics_loc)[source]
Directories resolved for a specific case run.
- faultmap.config_setup.get_locations(mode='cases')[source]
Gets all required directories related to the specified mode.
TODO: Remove the need for this by using proper test fixtures
- Parameters:
mode (string) – Either ‘test’ or ‘cases’. Specifies whether the test or user configurable cases directories should be set. Test directories are read from test_config.json which is bundled with the code, while cases directories are read from case_config.json which must be created by the user.
- Returns:
A named tuple containing
data_loc,config_loc,save_loc, andinfodynamics_locpaths.- Return type:
- faultmap.config_setup.run_setup(mode, case)[source]
Gets all required directories from the case configuration file.
- Parameters:
mode (Literal['test', 'tests', 'cases']) – Either ‘test’ or ‘cases’. Specifies whether the test or user configurable cases directories should be set. Test directories are read from test_config.json which is bundled with the code, while cases directories are read from case_config.json which must be created by the user.
case (str) – The name of the case that is to be run. Points to dictionary in either test or case config files.
- Returns:
CaseSetup named tuple containing
save_loc,case_config_dir,case_dir, andinfodynamics_locpaths.- Return type:
faultmap.data_processing
Data processing support tasks.
- faultmap.data_processing.shuffle_data(input_data)[source]
Returns a (seeded) randomly shuffled array of data. The data input needs to be a two-dimensional numpy array.
- faultmap.data_processing.gen_iaaft_surrogates(data, iterations)[source]
Generates iterative amplitude adjusted Fourier transform (IAAFT) surrogates
- class faultmap.data_processing.ResultReconstructionData(mode, case)[source]
Creates a data object from file and or function definitions for use in array creation methods.
- faultmap.data_processing.process_aux_file(filename, bias_correct=True, mi_scale=False, allow_neg=False)[source]
Processes an auxiliary file and returns a list of affected_vars, weight_array as well as relative significance weight array.
- faultmap.data_processing.create_arrays(data_dir, variables, bias_correct, mi_scale, generate_diffs)[source]
data_dir is the location of the auxiliary data and weights folders for the specific case that is under investigation
variables is the list of variables
- Parameters:
data_dir (Path)
- faultmap.data_processing.create_signtested_directionalarrays(datadir, writeoutput)[source]
Checks whether the directional weight arrays have corresponding absolute positive entries, writes another version with zeros if absolutes are negative.
datadir is the location of the auxdata and weights folders for the specific case that is under investigation
tsfilename is the file name of the original time series data file used to generate each case and is only used for generating a list of variables
- faultmap.data_processing.extract_trends(datadir, writeoutput)[source]
datadir is the location of the weight_array and delay_array folders for the specific case that is under investigation
tsfilename is the file name of the original time series data file used to generate each case and is only used for generating a list of variables
- faultmap.data_processing.result_reconstruction(mode, case)[source]
Reconstructs the weight_array and delay_array for different weight types from data generated by run_weightcalc process.
WIP: For transient cases, generates difference arrays between boxes.
The results are written to the same folders where the files are found.
- faultmap.data_processing.trend_extraction(mode, case, write_output)[source]
Extracts dynamic trend of weights and delays out of weight_array and delay_array results between multiple boxes generated by the run_createarrays process for transient cases.
The results are written to the trends results directory.
- faultmap.data_processing.bandgap(min_freq, max_freq, vardata)[source]
Bandgap filter based on FFT/IFFT concatenation
- faultmap.data_processing.bandgapfilter_data(raw_tsdata, normalised_tsdata, variables, low_freq, high_freq, saveloc, case, scenario)[source]
Bandgap filter data between the specified high and low frequenices. Also writes filtered data to standard format for easy analysis in other software, for example TOPCAT.
- faultmap.data_processing.read_connectionmatrix(connection_loc)[source]
Imports the connection scheme for the data. The format of the CSV file should be: empty space, var1, var2, etc… (first row) var1, value, value, value, etc… (second row) var2, value, value, value, etc… (third row) etc…
value = 1 if column variable points to row variable (causal relationship) value = 0 otherwise
- faultmap.data_processing.read_scale_limits(scaling_loc)[source]
Imports the scale limits for the data. The format of the CSV file should be: var, low, nominal, high, vartype (first row) var1, float, float, float, [‘D’, ‘S’] (second row) var2, float, float, float, [‘D, ‘S’] (third row) etc…
type ‘D’ indicates disturbance variable and maximum deviation will be used type ‘S’ indicates state variable and minimum deviation will be used
- faultmap.data_processing.read_biasvector(biasvector_loc)[source]
Imports the bias vector for faultmap purposes. The format of the CSV file should be: var1, var2, etc … (first row) bias1, bias2, etc … (second row)
- faultmap.data_processing.read_header_values_datafile(location)[source]
This method reads a CSV data file of the form: header, header, header, etc… (first row) value, value, value, etc… (second row) etc…
- faultmap.data_processing.read_matrix(matrix_loc)[source]
This method reads a matrix scheme for a specific scenario.
Might need to pad matrix with zeros if it is non-square
- faultmap.data_processing.build_graph(variables, gain_matrix, connections, bias_vector)[source]
Builds a directed graph using the given variables, gain matrix, connections, and bias vector.
- Parameters:
variables (list) – A list of variable names.
gain_matrix (numpy.ndarray) – A 2D array of gains.
connections (numpy.ndarray) – A 2D array of connections.
bias_vector (numpy.ndarray) – A 1D array of biases.
- Returns:
A directed graph with weights and biases.
- Return type:
- faultmap.data_processing.rank_backward(variables, gainmatrix, connections, biasvector, dummyweight, dummycreation)[source]
This method adds a unit gain node to all nodes with an out-degree of 1 in order for the relative scale to be retained. Therefore all nodes with pointers should have 2 or more edges pointing away from them.
It uses the number of dummy variables to construct these gain, connection and variable name matrices.
- faultmap.data_processing.get_box_endates(clean_df, window, overlap, freq)[source]
Gets the end dates of boxes from dataframe that are continous over window and guarenteed to have a maximum overlap.
clean_df: clean dataframe with nan assigned to all bad data window: size of window in steps at desired frequency overlap: size of minimum overlap desired in steps at desired frequency
- faultmap.data_processing.get_continuous_boxes(clean_df, window, overlap, freq)[source]
Splits a DataFrame into continuous boxes of a specified window size and overlap.
- Parameters:
- Returns:
- (array_boxes, boxdates) where array_boxes is a list
of arrays per box and boxdates is a list of arrays with start/end timestamps per box.
- Return type:
- faultmap.data_processing.split_time_series_data(input_data, sample_rate, box_size, box_num)[source]
Splits the input data into arrays useful for analyzing the change of weights over time.
- Parameters:
input_data (numpy.ndarray) – A numpy array containing values for a single variable after sub-sampling.
sample_rate (float) – The rate of sampling in time units (after sub-sampling).
box_size (int) – The size of each returned dataset in time units.
box_num (int) – The number of boxes that need to be analyzed.
- Returns:
A list of numpy arrays, where each array represents a box of data.
- Return type:
Notes
Boxes are evenly distributed over the provided dataset. The boxes will overlap if box_size * box_num is more than the simulated time, and will have spaces between them if it is less.
- faultmap.data_processing.calc_signal_entropy(var_data, weight_calc_data, estimator='kernel')[source]
Calculates single signal differential entropies by making use of the JIDT continuous box-kernel implementation.
- Parameters:
weight_calc_data (WeightCalcData)
estimator (Literal['gaussian', 'kernel', 'kozachenko'])
- Return type:
- faultmap.data_processing.vectorselection(data, timelag, sub_samples, k=1, l=1)[source]
Generates sets of vectors from tags time series data for calculating transfer entropy.
For notation references see Shu2013.
Takes into account the time lag (number of samples between vectors of the same variable).
In this application the prediction horizon (h) is set to equal to the time lag.
The first vector in the data array should be the samples of the variable to be predicted (x) while the second vector should be sampled of the vector used to make the prediction (y).
sub_samples is the amount of samples in the dataset used to calculate the transfer entropy between two vectors and must satisfy sub_samples <= samples
The required number of samples is extracted from the end of the vector. If the vector is longer than the number of samples specified plus the desired time lag then the remained of the data will be discarded.
k refers to the dimension of the historical data to be predicted (x)
l refers to the dimension of the historical data used to do the prediction (y)
faultmap.datagen
Generates various test and demo data sets.
- faultmap.datagen.connectionmatrix_2x2()
Generates a 2x2 connection matrix for use in test.
- faultmap.datagen.connectionmatrix_4x4()
Generates a 4x4 connection matrix for use in test.
- faultmap.datagen.connectionmatrix_5x5()
Generates a 5x5 connection matrix for use in test.
- faultmap.datagen.seed_randn(seed, samples)
Set random seed.
- faultmap.datagen.seed_rand(seed, samples)
Set random seed.
- faultmap.datagen.autoreg_gen(params)[source]
Generates an autoregressive set of vectors.
A constant seed is used for testing comparison purposes.
- faultmap.datagen.delay_gen(params)[source]
Generates a normally distributed random data vector and a pure delay companion.
- Parameters:
params (list) – List with the first entry being the sample length of the returned signals and the second entry the delay between them.
- Returns:
data – Array containing the generated signals arranged in columns.
- Return type:
- faultmap.datagen.autoreg_datagen(delay, timelag, samples, sub_samples, k=1, l=1)[source]
Generates autoreg data for a specific timelag (equal to prediction horizon) for a set of autoregressive data.
sub_samples is the amount of samples in the dataset used to calculate the transfer entropy between two vectors (taken from the end of the dataset). sub_samples <= samples
Currently only supports k = 1; l = 1
You can search through a set of time lags in an attempt to identify the original delay. The transfer entropy should have a maximum value when timelag = delay used to generate the autoregressive dataset.
- faultmap.datagen.sinusoid_shift_gen(params, period=100, noise_amplitude=0.1, n_signals=5, add_noise=False)[source]
Generates sinusoid signals together with optionally uniform noise. The signals are shifted by a quarter period.
- Parameters:
params (list) – List with the first (and only) entry being the sample length of the returned signals.
period (int, default=100) – The period of the sinusoid in terms of samples.
noise_amplitude (float, default=0.5) – A multiplier for mean-centred uniform noise to be added to the signal. The amplitude of the sine is unity.
n_signals (int, default=5) – How many signals to return.
add_noise (bool, default=False) – If True, noise is added to the sinusoidal signals.
- Returns:
data – Array containing the generated signals arranged in columns.
- Return type:
- faultmap.datagen.sinusoid_gen(params, period=100, noise_amplitude=1.0)[source]
Generates sinusoid signals together with optionally uniform noise. The signals are shifted by a quarter period.
- Parameters:
params (list) – List with the first (and only) entry being the sample length of the returned signals.
period (int, default=100) – The period of the sinusoid in terms of samples.
noise_amplitude (float, default=0.5) – A multiplier for mean-centred uniform noise to be added to the signal. The amplitude of the sine is unity.
- Returns:
data – Array containing the generated signals arranged in columns.
- Return type:
faultmap.infodynamics
Methods used in the calculation of transfer entropy. A JIDT wrapper.
- faultmap.infodynamics.check_jvm(infodynamics_path)[source]
Check if the Java Virtual Machine (JVM) is started and start it if it is not.
- Parameters:
infodynamics_path (str) – The file path to the infodynamics jar file.
- Returns:
None
- faultmap.infodynamics.setup_te(infodynamics_path, method, **parameters)[source]
Prepares the teCalc class of the Java Infodynamics Toolkit (JIDT) in order to calculate transfer entropy according to the kernel or Kraskov estimator method. Also supports discrete transfer entropy calculation.
- faultmap.infodynamics.calc_te(infodynamics_path, calc_method, affected_data, causal_data, **parameters)[source]
Calculates the transfer entropy for a specific time lag (equal to prediction horizon) between two sets of time series data.
This implementation makes use of the infodynamics toolkit: https://jlizier.github.io/jidt/
The transfer entropy should have a maximum value when time lag = delay used to generate an autoregressive dataset, or will otherwise indicate the dead time between data indicating a causal relationship.
- faultmap.infodynamics.setup_mi_calculator(infodynamics_path, method, **parameters)[source]
Instantiates a mutual information class of the Java Infodynamics Toolkit (JIDT) to calculate mutual information according to the kernel or Kraskov estimator method. Also supports discrete mutual information calculation.
The Kraskov method is the recommended method and also provides methods for auto-embedding. The max corr AIS auto-embedding method will be enabled as the default.
- faultmap.infodynamics.setup_entropy_calculator(infodynamics_path, estimator='kernel', kernel_bandwidth=0.1, multivariate=False)[source]
Instantiates an entropy calculator from a class of the Java Infodynamics Toolkit (JIDT) to calculate differential entropy (continuous signals) according to the estimation method specified.
- Parameters:
infodynamics_path (path) – Location of infodynamics.jar
estimator (string, default='kernel') – Either ‘kernel’ or ‘gaussian’. Specifies the estimator to use in determining the required probability density functions.
kernel_bandwidth (float) – The width of the kernels for the kernel method. If normalisation is performed, these are in terms of standard deviation, otherwise absolute.
multivariate (bool, default=False) – Indicates whether the entropy is to be calculated on a univariate or multivariate signal.
estimator
- Returns:
entropy_calc
- Return type:
EntropyCalculator JIDT object
- faultmap.infodynamics.calc_entropy(entropy_calculator, data, estimator)[source]
Estimates the entropy of a single signal.
- Parameters:
entropy_calculator (EntropyCalculator JIDT object) – The estimation method is determined during initialisation of this object beforehand.
data (one-dimensional numpy.ndarray) – The uni-variate signal.
estimator (Literal['gaussian', 'kernel', 'kozachenko'])
- Returns:
entropy – The entropy of the signal.
- Return type:
Notes
The entropy calculated with the Gaussian estimator is in nats, while that calculated by the kernel estimator is in bits. Nats can be converted to bits by division with ln(2).
faultmap.weightcalc
This module provides methods for calculating the gains (weights) of edges connecting variables in the directed graph.
Calculation of both Pearson’s correlation and transfer entropy is supported. Transfer entropy is calculated according to the global average of local entropy method. All weights are optimized with respect to time shifts between the time series data vectors (i.e. cross-correlated).
The delay giving the maximum weight is returned, together with the maximum weights.
All weights are tested for significance. The Pearson’s correlation weights are tested for significance according to the parameters presented by Bauer2005. The transfer entropy weights are tested for significance using a non-parametric rank-order method using surrogate data generated according to the iterative amplitude adjusted Fourier transform method (iAAFT).
- class faultmap.weightcalc.WeightCalcData(mode, case, single_entropies, fft_calc, do_multiprocessing, use_gpu)[source]
Creates a data object from files or functions for use in weight calculation methods.
- Parameters:
- faultmap.weightcalc.writecsv_weightcalc(filename, items, header)[source]
CSV writer customized for use in weightcalc function.
- faultmap.weightcalc.calculate_weights(weight_calc_data, method, scenario, write_output)[source]
Determines the maximum weight between two variables by searching through a specified set of delays.
- Parameters:
weight_calc_data (WeightCalcData)
method (str) – Can be one of the following: ‘cross_correlation’ ‘partial_correlation’ – does not support time delays ‘transfer_entropy_kernel’ ‘transfer_entropy_kraskov’
scenario (str)
write_output (bool)
TODO: Fix partial correlation method to make use of time delays
Returns:
- faultmap.weightcalc.weight_calc(mode, case, writeoutput=False, single_entropies=False, calc_fft=False, do_multiprocessing=False, use_gpu=False)[source]
Reports the maximum weight as well as associated delay obtained by shifting the affected variable behind the causal variable a specified set of delays.
- Parameters:
mode (str) – Either ‘test’ or ‘cases’. Tests data are generated dynamically and stored in specified folders. Case data are read from file and stored under organized headings in the saveloc directory specified in config.json.
case (str) – The name of the case that is to be run. Points to dictionary in either test or case config files.
single_entropies (bool) – Flags whether the entropies of single signals should be calculated.
calc_fft (bool) – Indicates whether the FFT of all individual signals should be calculated.
do_multiprocessing (bool) – Indicates whether the weight calculation operations should run in parallel processing mode where all available CPU cores are utilized.
writeoutput (bool)
use_gpu (bool)
- Return type:
None
Notes
Supports calculating weights according to either correlation or transfer entropy metrics.
faultmap.weightcalc_onesource
Calculates weight and auxiliary data for each source variable and writes to files.
All weight data file output writers are now called at this level, making the process interruption tolerant up to a single source variable analysis.
faultmap.weightcalculators
This module stores the weight calculator classes used by the weightcalc module.
- faultmap.weightcalculators.flexiblemethod(method)[source]
Decorator to allow methods to be defined as either static or instance methods.
- class faultmap.weightcalculators.WeightCalculator(weight_calc_data, *_)[source]
Abstract base class for weight calculators.
- Parameters:
weight_calc_data (WeightCalcData)
- calculate_surrogate_weight(*args, **kwargs)[source]
Calculates surrogate weights for significance testing.
- class faultmap.weightcalculators.CorrelationWeightCalculator(weight_calc_data)[source]
Implementation of WeightCalculator for correlation-based weight calculation.
Calculates correlation using covariance with optional standardisation and de-trending. This allows the effect of Skogestad scaling to be reflected in final result.
- Parameters:
weight_calc_data (WeightCalcData)
- static calculate_weight(source_var_data, destination_var_data, *_)[source]
Calculates the correlation between two vectors containing timer series data.
- calculate_surrogate_weight(source_var, destination_var, box, trials)[source]
Calculates surrogate correlation values for significance threshold purposes.
Two methods for generating surrogate data is available: iAAFT (Schreiber 2000a) or random_shuffle in time.
Returns list of surrogate correlation entropy values of length num.
- thresh_rankorder(surr_corr, surr_dirindex)[source]
Calculates the minimum threshold required for a correlation value to be considered significant.
Makes use of a 95% single-sided certainty and a rank-order method. This correlates to taking the maximum transfer entropy from 19 surrogate transfer entropy calculations as the threshold, see Schreiber2000a.
Alternatively, the second highest from 38 observations can be taken, etc.
- thresh_stdevs(surr_corr, surr_dirindex, stdevs)[source]
Calculates the minimum threshold required for a transfer entropy value to be considered significant.
Makes use of a six sigma Gaussian check as done in Bauer2005 with 30 samples of surrogate data.
- class faultmap.weightcalculators.TransferEntropyWeightCalculator(weight_calc_data, estimator)[source]
Transfer entropy based weight calculation.
- Parameters:
weight_calc_data (WeightCalcData)
estimator (Literal['kernel', 'kraskov', 'discrete'])
- calculate_weight(cause_var_data, affected_var_data, *_)[source]
“Calculates the transfer entropy between two vectors containing timer series data.
- report(source_var_index, destination_var_index, weight_list, box, proplist, milist)[source]
Calculates and reports the relevant output for each combination of variables tested.
- calculate_surrogate_weight(source_var, destination_var, box, delay_index, trials)[source]
Calculates surrogate transfer entropy values for significance threshold purposes.
Two methods for generating surrogate data is available: iAAFT (Schreiber 2000a) or random_shuffle in time.
Returns list of surrogate transfer entropy values of length num.
- static threshold_rankorder(surrogate_directional_weights, surrogate_absolute_weights)[source]
Calculates the minimum threshold required for a transfer entropy value to be considered significant.
Makes use of a 95% single-sided certainty and a rank-order method. This correlates to taking the maximum transfer entropy from 19 surrogate transfer entropy calculations as the threshold, see Schreiber2000a.
Alternatively, the second highest from 38 observations can be taken, etc.
faultmap.noderank
faultmap.graphreduce
Receives a weighted directed graph in GML format and deletes all edges that connects nodes that are connected via some other path. Only the longest paths are retained.
The graph should be available in the “graphs” directory in the case data folder. A reduced graph will have the same title as the original file with the suffix “_simplified”.
A <casename>_graphreduce.json configuration file needs to be available in the case directory root.
- class faultmap.graphreduce.GraphReduceData(mode, case)[source]
Creates a data object from file and or function definitions for use in graph reduce method.
- faultmap.graphreduce.compute_edge_threshold(graph, percentile)[source]
Calculates the threshold that should be used to delete edges from the original graph based on determined templates.
- faultmap.graphreduce.delete_lowval_edges(graph, weight_threshold, remove_self_loops=True)[source]
Deletes all edges with weight below the threshold value. Also deletes all self-looping edges.
- faultmap.graphreduce.decompose(input_, output_)[source]
Decomposes (flattens) a list of lists into a simple list.
- faultmap.graphreduce.delete_loworder_edges(graph, max_depth, weight_discretion)[source]
Returns a simplified graph with higher order connections eliminated. All self-loops are also deleted.
The level up to which the search for higher order connections should be completed is indicated by the ‘max_depth’ parameter. A value of 1 means that children of children will be investigated, while a value of 2 means that children of children of children will be included in the search, and so on. If depth is set to “full”, then the search is completed until no more children is found.
If the ‘weight_discretion’ boolean is True, a higher order connection between a source node and a child will not be eliminated if this connection weight is higher than the weight of the connection between the last higher-order child to the destination node under question.
faultmap.networkgen
faultmap.type_definitions
Type definitions used throughout the library.