Development of Data Import and Workflow Training UI

Plan, Implement, and Validate an UI in tikiwiki for Data Import and Workflow training, using the confidence attribution workflows as first study case to validate implementation.

Use Case 1: Data Import and Analysis for Model Training

Actors:

User (Data Analyst, Data Scientist, Software Engineer)
Data Import and Processing System

Objective:

Enable users to import data for analysis through workflows, select relevant variables, and train reliable predictors.

Preconditions:

The system must expose an API that supports PUT requests with a flag indicating it is an import.
The import process must be asynchronous and processable in the background.
Imported data must contain appropriate timestamps for identifying past events.

Main Flow:

The user initiates a data import process via API.
The system processes the import asynchronously and stores the data.
The system calculates correlations between imported series and other existing series in the domain using Pearson's coefficient (or another method).
Upon completion, the system notifies the user via email, providing a link to continue workflow configuration.
The user accesses the interface and views a form with select boxes to choose series for analysis.
The system displays the series list sorted by correlation degree.
The user can mark series as synonyms and exclude them from the training set.
The system enforces restrictions for selecting series for prediction (e.g., at least 2 series with a minimum correlation of x and a maximum of y, e.g., min 2, max 5).
The user selects the series, network layers, and neurons, then starts predictor training.
The system samples the imported series, allowing specific period selection or random sampling.
The model is trained and validated.
The result (model error) is presented to the user.
The user decides whether to continue the process, applying confidence to imported and new data.

Alternative Flows:

(4a) If the import fails, the system informs the user of the error.
(6a) If a model for this data type already exists, the user is notified of its generation time and error.
(7a) If a series lacks sufficient correlation, the system disregards it automatically.
(13a) If the user does not continue, the model is deleted. The user can restart from step 5.
(13b) If the user continues, the model is saved, and an input workflow is created to apply confidence when new data of the same type (domain, unit, dev) is inserted. The user can return to step 5 to create models for other data types or replace the existing one.

Postconditions:

The predictive model is trained and ready for use.
Imported data has been processed and validated.
Data confidence has been updated.
The system can apply confidence to future data based on the trained model.

Use Case 2: Data Analysis for Model Training