Input data formats

This section describes the required and optional input data formats.

Time series data format

The primary input is time series data from process sensors, provided as comma-separated value (CSV) files.

Format requirements:

The first row must be a header line, with the first column label being Time.
The first column must contain timestamps in UNIX time (seconds since epoch).
Remaining columns contain raw measurement data for each process tag – no normalization is needed, as this is handled automatically during preprocessing.

Example:

Time,TAG_001,TAG_002,TAG_003
1546300800,45.2,101.3,7.81
1546300860,45.5,101.1,7.79
1546300920,45.3,101.4,7.82

Descriptive labels

Optional descriptive labels can be associated with each process data tag. Provide these as a CSV file named tag_descriptions.csv with two columns:

First column: Tag name
Second column: Description

Example:

Tag name,Description
TAG_001,Reactor temperature
TAG_002,Feed flow rate
TAG_003,Product pH

These descriptions are used in output reports and graph labels to make results more readable.

Connectivity information

Optionally, you can constrain the analysis to only consider specific connections between process elements. This is useful when plant topology information is available.

Note

Adding connectivity information is not always beneficial. In some cases it can produce poorer root cause analysis, because higher-order connections may play an important role in amplifying a node’s centrality score. Use this option with care and compare results with and without connectivity constraints.