Input data formats ================== This section describes the required and optional input data formats. Time series data format ----------------------- The primary input is time series data from process sensors, provided as comma-separated value (CSV) files. Format requirements: * The first row must be a header line, with the first column label being ``Time``. * The first column must contain timestamps in UNIX time (seconds since epoch). * Remaining columns contain raw measurement data for each process tag -- no normalization is needed, as this is handled automatically during preprocessing. Example:: Time,TAG_001,TAG_002,TAG_003 1546300800,45.2,101.3,7.81 1546300860,45.5,101.1,7.79 1546300920,45.3,101.4,7.82 Descriptive labels ------------------ Optional descriptive labels can be associated with each process data tag. Provide these as a CSV file named ``tag_descriptions.csv`` with two columns: * First column: ``Tag name`` * Second column: ``Description`` Example:: Tag name,Description TAG_001,Reactor temperature TAG_002,Feed flow rate TAG_003,Product pH These descriptions are used in output reports and graph labels to make results more readable. Connectivity information ------------------------ Optionally, you can constrain the analysis to only consider specific connections between process elements. This is useful when plant topology information is available. .. note:: Adding connectivity information is not always beneficial. In some cases it can produce poorer root cause analysis, because higher-order connections may play an important role in amplifying a node's centrality score. Use this option with care and compare results with and without connectivity constraints.