Skip to content

Factory Scene Data Modeling Guide โ€‹

This document is intended for industrial engineers, process technicians, and data analysts. It systematically explains how to identify valuable scenarios, collect data, classify variables, and build effective production models in a factory environment.


1. Scenario Identification and Value Assessment โ€‹

1.1 What is a "Scenario"? โ€‹

In industrial data modeling, a scenario refers to a complete production process unit, including:

  • Clear inputs (raw materials, parameter settings)
  • Observable process states (temperature, pressure, flow rate, etc.)
  • Quantifiable output results (product quality, yield, energy consumption, etc.)

Example Scenarios:

Scenario TypeDescriptionTypical Industry
Reactor Batch ControlBatch reaction process in chemical/pharmaceuticalsChemical, Pharmaceutical
Fermentation Process OptimizationTemperature, pH, and dissolved oxygen control in microbial fermentationFood, Biological
Extrusion Molding ProcessExtrusion temperature, pressure, and speed control for plastics/rubberMaterial Processing
Drying Process ControlTemperature, humidity, and time control in hot air dryingFood, Agricultural Products
Batch Mixing SystemProportioning accuracy control of multi-component raw materialsFood, Feed

1.2 How to Judge the Core Value of a Scenario? โ€‹

Not all scenarios are worth modeling. Use the following evaluation framework:

Value Assessment Matrix โ€‹

                    High Business Value
                         โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚   Prioritize  โ”‚   Strategic   โ”‚
  High   โ”‚   Modeling    โ”‚   Reserve     โ”‚
  Data   โ”‚ (Act Now)     โ”‚(Long-term)    โ”‚
  Avail- โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  abilityโ”‚   Quick POC   โ”‚   Shelve      โ”‚
         โ”‚   (Pilot)     โ”‚ (Wait)        โ”‚
  Low    โ”‚               โ”‚               โ”‚
  Data   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  Avail.             Low Business Value

Value Assessment Checklist โ€‹

Business Value Dimension (1-5 points each):

Evaluation ItemScoring CriteriaScore
Quality Issue Frequency5=Multiple times a month, 1=Rarely occurs___
Quality Loss Amount5=Annual loss > 1M, 1=< 100k___
Process Optimization Space5=Obvious room for optimization, 1=Already very mature___
Replicability5=Applicable to multiple lines, 1=Single point dedicated___
Management Attention5=High-level focus, 1=Grassroots spontaneous___

Data Availability Dimension (1-5 points each):

Evaluation ItemScoring CriteriaScore
Historical Data Volume5=>1 year complete data, 1=Almost no data___
Data Quality5=Complete and accurate, 1=Massive missing/errors___
Collection Automation5=Fully automatic, 1=Fully manual recording___
Key Variable Measurability5=All online measurable, 1=Mostly offline testing___
IT System Support5=Has MES/SCADA, 1=No IT system___

Score Interpretation:

  • Business Value โ‰ฅ 20 AND Data Availability โ‰ฅ 20: Prioritize Modeling
  • Business Value โ‰ฅ 20 BUT Data Availability < 20: Improve Data Collection First
  • Business Value < 15: Temporarily Shelve, Seek Higher Value Scenarios

1.3 Scenario Priority Ranking Example โ€‹

Scenario Assessment for a Food Processing Plant:

ScenarioBusiness ValueData AvailabilityPriorityAction Recommendation
Sterilization Temp Control25 pts20 ptsโญโญโญโญโญStart Immediately
Batching Accuracy Opt.22 pts18 ptsโญโญโญโญStart after adding sensors
Packaging Seal Inspection15 pts22 ptsโญโญโญLow priority, consider when resources are ample
Raw Material Inbound Insp.12 pts15 ptsโญโญTemporarily Shelve

2. Data Collection Strategy โ€‹

2.1 Four Levels of Data Collection โ€‹

Level 1: Manual Recording
    โ””โ”€โ”€ Paper records, Excel manual entry
    โ””โ”€โ”€ Suitable for: Initial exploration, no automation system
    โ””โ”€โ”€ Disadvantages: Error-prone, low frequency, hard to trace

Level 2: Semi-Automatic Collection
    โ””โ”€โ”€ Instrument data export + manual sorting
    โ””โ”€โ”€ Suitable for: Key equipment exists but no system integration
    โ””โ”€โ”€ Improvement: Establish standardized export templates

Level 3: Automatic Collection
    โ””โ”€โ”€ PLC/SCADA automatic recording
    โ””โ”€โ”€ Suitable for: Automated control systems exist
    โ””โ”€โ”€ Advantages: High frequency, accurate, traceable

Level 4: Integrated Platform
    โ””โ”€โ”€ MES/ERP/Data Lake integration
    โ””โ”€โ”€ Suitable for: Highly digitalized factories
    โ””โ”€โ”€ Advantages: Data correlation, full-link traceability

2.2 Data Collection Planning Template โ€‹

Create a data collection plan for each scenario:

markdown
## Scenario Name: [Fill in]

### 1. Controlled Variables (Y)
| Variable Name | Measurement Method | Frequency | Data Location | Notes |
|-------|---------|---------|---------|------|
| Product Quality Index | Lab testing | Per batch | LIMS | 2-hour testing cycle |
| Product Yield | Auto statistics | Real-time | MES | - |

### 2. Feature Variables (X)
| Variable Name | Variable Type | Measurement Method | Frequency | Data Location |
|-------|---------|---------|---------|---------|
| Reaction Temp | Set/Manipulated Variable | Temp Sensor | 1 min | SCADA |
| Material Batch | Disturbance Variable | Barcode Scan | Per batch | ERP |
| Ambient Temp | Disturbance Variable | Temp/Humidity Meter | 1 hour | Manual |

### 3. Data Collection Cycle
- Historical Data Traceback: [ ] months
- New Data Collection: Starting from [Date]
- Target Sample Size: At least [ ] batches/cycles

### 4. Data Quality Assurance
- [ ] Sensor calibration plan
- [ ] Outlier handling rules
- [ ] Missing value imputation strategy
- [ ] Data review process

2.3 Data Collection Best Practices โ€‹

DO:

  • โœ… Record complete batch information (time, operator, material batch)
  • โœ… Collect both normal and abnormal condition data
  • โœ… Annotate known abnormal events (equipment failure, material change, etc.)
  • โœ… Maintain timestamp consistency
  • โœ… Regularly backup raw data

DON'T:

  • โŒ Only collect "good" data and discard "bad" data
  • โŒ Have inconsistent timestamps from different sources
  • โŒ Manually transcribe without keeping original records
  • โŒ Use a sampling frequency that is too low (cannot capture dynamics)
  • โŒ Use a sampling frequency that is too high (generates massive redundant data)

3. Variable Classification System โ€‹

3.1 Core Variable Definitions (Industry Standard) โ€‹

In the field of industrial modeling and control, we follow these standard variable definitions:

AbbreviationFull NameChineseDescription
SVSet Value่ฎพๅฎšๅ€ผTarget value set for the manipulated variable, modifiable on DCS
MVManipulated Variableๆ“็บตๅ˜้‡Valves, pumps, etc. that operators / APC can directly adjust
DVDisturbance Variableๆ‰ฐๅŠจๅ˜้‡Uncontrollable, unadjustable disturbance factors
CVControlled Variable่ขซๆŽงๅ˜้‡Core target to be controlled and optimized
PVProcess Value่ฟ‡็จ‹ๆต‹้‡ๅ€ผActual values measured by instruments / sensors

When modeling data, we map these variables to the model's inputs (X) and outputs (Y):

text
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Variable Modeling Mapping System             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                   โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚   โ”‚Set/Manipulated โ”‚    โ”‚ Disturbance  โ”‚    โ”‚  Controlled  โ”‚        โ”‚
โ”‚   โ”‚   (SV/MV)    โ”‚    โ”‚     (DV)     โ”‚    โ”‚     (CV)     โ”‚        โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
โ”‚          โ”‚                   โ”‚                   โ”‚                โ”‚
โ”‚          โ–ผ                   โ–ผ                   โ–ผ                โ”‚
โ”‚   Parameters we can   Variables we cannot   Targets we want to    โ”‚
โ”‚   actively adjust(X)  control but affect(X) predict/optimize(Y)   โ”‚
โ”‚                                                                   โ”‚
โ”‚   Example: Reaction   Example: Ambient      Example: Product      โ”‚
โ”‚     temp setpoint(SV)   temperature           purity (CV)         โ”‚
โ”‚     Valve opening(MV)   Material fluctuation  (Usually shown as PV)โ”‚
โ”‚                                                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3.2 Detailed Variable Types โ€‹

Set Value/Manipulated Variable (SV/MV) โ€‹

Definition: Input parameters that operators or control systems can directly adjust. In actual industrial production, in most cases, operators adjust Set Values (SV) to indirectly control actuators, while Manipulated Variables (MV) are typically automatic outputs from low-level PID or control systems.

Characteristics:

  • Can be actively changed (mainly modifying SV)
  • Usually have clear operation ranges
  • Main focus for process optimization

Common Examples:

IndustrySet/Manipulated Variable Examples
ChemicalReaction temperature, pressure, stirring speed, catalyst dosage
FoodSterilization temp, holding time, ingredient ratio, drying wind speed
PharmaceuticalHeating rate, holding time, cooling rate, pH setpoint
MetallurgyHeating power, cooling water flow, rolling speed

Role in Modeling:

  • As the core component of X variables
  • Key focus of VIP analysis
  • Direct operation targets for process optimization

Disturbance Variables (DV) โ€‹

Definition: Variables that affect process output but cannot (or are difficult to) be actively controlled.

Characteristics:

  • Objectively exist, hard to intervene artificially
  • May change over time
  • Factors to consider for model robustness

Common Examples:

TypeDisturbance Variable ExamplesCoping Strategy
MaterialBatch differences, moisture fluctuation, impurity contentInbound inspection, feedforward control
EnvironmentAmbient temp, humidity, atmospheric pressureEnvironmental compensation, AC control
EquipmentEquipment wear, catalyst decay, heat exchanger foulingRegular maintenance, online correction
OperationOperator differences, shift handover impactSOP standardization, training

Role in Modeling:

  • As a supplement to X variables
  • Help explain model residuals
  • Identify sources of "uncontrollable" variation

Controlled Variable (CV) โ€‹

Definition: Process outputs or quality indicators that we want to control within target ranges.

Characteristics:

  • Results of the process
  • Usually have clear quality standards
  • Targets for model prediction (Y)

Common Examples:

IndustryControlled Variable Examples
ChemicalProduct purity, conversion rate, selectivity, byproduct content
FoodMoisture content, color, taste score, microbial indicators
PharmaceuticalActive ingredient content, dissolution rate, impurity profile
MaterialTensile strength, hardness, surface finish

Role in Modeling:

  • As Y variables (Controlled variables)
  • Objects for model prediction and optimization
  • Core indicators for evaluating model performance

Process Value (PV) โ€‹

Definition: Process values actually measured by instruments or sensors.

Characteristics:

  • True reflection of physical or chemical states
  • The basis for calculating or evaluating CV
  • May contain measurement noise or errors

Common Examples:

IndustryProcess Value Examples
ChemicalActual temperature measured by thermocouple, flow meter reading
FoodOnline moisture meter reading, actual pH value
PharmaceuticalStirring motor current fed back by sensor

Role in Modeling:

  • Used to characterize controlled variables (Y)
  • Feedback current system state for optimization control

3.3 Variable Classification Decision Tree โ€‹

3.4 Variable Classification Example โ€‹

Scenario: Chemical Reactor Batch Control

Variable NameVariable TypeClassification ReasonModeling Role
Reaction Temp SetpointSet/Manipulated VariableAdjustable via DCSX
Reaction PressureSet/Manipulated VariableAdjustable via valvesX
Actual Reaction PressureProcess ValuePressure sensor feedbackX
Stirring SpeedSet/Manipulated VariableInverter controlX
Catalyst DosageSet/Manipulated VariableDetermined at batchingX
Material BatchDisturbance VariableDetermined by procurementX
Material MoistureDisturbance VariableNatural fluctuationX
Ambient TempDisturbance VariableUncontrollableX
Product Conversion RateControlled VariableProcess resultY
Product SelectivityControlled VariableQuality indicatorY

4. Scenario Modeling Practice โ€‹

4.1 Pre-Modeling Preparation โ€‹

Data Sorting Checklist โ€‹

markdown
## Pre-Modeling Checklist

### Data Integrity
- [ ] Sample size โ‰ฅ 30 (PLS minimum requirement)
- [ ] Number of X variables < Sample size/2 (Avoid overfitting)
- [ ] No severe missing values (<10%)
- [ ] Timestamps correctly aligned

### Variable Confirmation
- [ ] Set/Manipulated variables (SV/MV) identified and marked
- [ ] Disturbance variables identified and marked
- [ ] Process values (PV) identified and evaluated
- [ ] Controlled variables (CV) clarified
- [ ] Variable units unified

### Business Understanding
- [ ] Understand normal operating ranges
- [ ] Understand common abnormal patterns
- [ ] Clarify modeling goals (Prediction/Optimization/Monitoring)

4.2 Modeling Workflow โ€‹

4.3 Step-by-Step Modeling Guide โ€‹

Step 1: Data Import and Configuration โ€‹

  1. Import Data: Import the sorted Excel data into the platform
  2. Set Header Row: Mark the variable name row
  3. Configure X Variables:
    • Select all Set/Manipulated variables (SV/MV)
    • Select important Disturbance variables (DV)
    • (Optional) If concerned with process states, introduce Process Values (PV)
  4. Configure Y Variables: Select Controlled variables (CV)

Step 2: Exploratory Analysis (PCA) โ€‹

Purpose: Understand data structure, identify abnormal samples

Operations:

  1. Create a PCA model using only X variables
  2. View the Score Plot
  3. Identify outliers far from the main cluster
  4. View Tยฒ and SPE plots, mark statistical anomalies

Interpretation:

  • Normal batches should cluster in the core area of the principal component space
  • Points far from the cluster need investigation for causes
  • Combine business knowledge to decide whether to exclude

Step 3: Build PLS Regression Model โ€‹

Operations:

  1. Configure X (Set/Manipulated + Disturbance) and Y (Controlled)
  2. Click "Fit" to train the model
  3. View model indicators:
    • RยฒY: Goodness of fit
    • QยฒY: Predictive ability (>0.5 acceptable, >0.9 excellent)

Diagnostics:

  • If QยฒY < 0.5: Check variable selection, increase sample size
  • If RยฒY is high but QยฒY is low: Overfitting, reduce the number of latent variables

Step 4: VIP Analysis and Variable Selection โ€‹

Purpose: Find the X variables that have the greatest impact on Y

Operations:

  1. View the VIP plot
  2. Identify key variables with VIP > 1
  3. Consider excluding variables with VIP < 0.5
  4. Remodel and validate

Business Interpretation:

  • Set/Manipulated variables with high VIP are the focus of process optimization
  • Disturbance variables with high VIP require enhanced monitoring

Step 5: Model Validation โ€‹

Internal Validation:

  • Cross-validation QยฒY
  • Check residual distribution

External Validation (If conditions permit):

  • Test with newly collected data
  • Compare predicted values with actual values

4.4 Typical Scenario Modeling Case โ€‹

Case: Fermentation Process Optimization โ€‹

Scenario Description:

  • Product: A certain amino acid fermentation
  • Goal: Increase product concentration (Y)
  • Cycle: 48-hour batch

Variable Classification:

Type (Algorithm)Control System TermVariable ExamplesDescription
Set/Manipulated Variables (X)SV / MV (Set Value/Manipulated Variable)Temperature setpoint(SV), pH setpoint(SV), stirring speed(MV), aeration rate(MV)Parameters actively adjusted by operators or APC in DCS (Usually modifying SV)
Disturbance Variables (X)DV (Disturbance Variable)Seed batch, medium batch, ambient temperatureObjective factors affecting the system but uncontrollable artificially
Controlled Variables (Y)CV (Controlled Variable)Product concentration, conversion rateActual detection results from offline testing or online instruments
Process Values (X or Y)PV (Process Value) as CV representation or valueActual temperature(PV), actual pH(PV)Process states fed back by sensors

Modeling Results:

  • RยฒY = 0.92, QยฒY = 0.85
  • Key variables with VIP > 1: pH setpoint, aeration rate, temperature setpoint
  • Finding: pH control accuracy has the greatest impact on product concentration

Optimization Suggestions:

  1. Upgrade the pH control system to improve control accuracy
  2. Establish a pH feedforward compensation model
  3. Expected to increase product concentration by 8-12%

5. Tool-Assisted Variable Selection โ€‹

5.1 Variable Selection Toolbox โ€‹

In the platform, you can use the following tools to assist in selecting variables:

Tool 1: Correlation Analysis โ€‹

Purpose: Identify collinearity among X variables

Operations:

  1. Use a heatmap to view correlations between X variables
  2. Identify highly correlated variable pairs with |r| > 0.8
  3. Keep one of them and exclude the redundant variable

Example:

If "Reaction Temp" and "Reactor Wall Temp" have a correlation of 0.95
โ†’ Only keep "Reaction Temp" (more directly controllable)

Tool 2: PCA Loading Analysis โ€‹

Purpose: Understand the internal structure among variables

Operations:

  1. View the PCA Loading Plot
  2. Identify groups of variables clustered together (representing similar information)
  3. Choose the most representative variable from each group

Tool 3: VIP Iterative Selection โ€‹

Purpose: Gradually optimize the variable set

Workflow:

Round 1: All variables โ†’ Calculate VIP
Round 2: Exclude variables with VIP<0.5 โ†’ Remodel
Round 3: Check QยฒY change
      โ†“
   If QยฒY drops <5%: Accept simplified model
   If QยฒY drops >10%: Restore some excluded variables

Tool 4: Variable Importance Ranking Table โ€‹

Comprehensive Evaluation Framework:

VariableVIPControllabilityMeasurement CostComp. ScoreSuggestion
Temp1.8HighLowโญโญโญโญโญKeep
Pressure1.5HighLowโญโญโญโญโญKeep
Material Batch0.3LowMediumโญโญExclude
Ambient Humidity0.4LowHighโญExclude

5.2 Variable Selection Decision Process โ€‹

5.3 Variable Selection Best Practices โ€‹

DO:

  • โœ… Prioritize retaining Set/Manipulated variables (optimizable)
  • โœ… Retain variables with high VIP and easy measurability
  • โœ… Retain variables that are "important by common sense" in business
  • โœ… Use cross-validation to test the simplified model

DON'T:

  • โŒ Only look at VIP and completely ignore business knowledge
  • โŒ Exclude too many variables at once
  • โŒ Exclude variables with low VIP but also low cost
  • โŒ Over-screen when the sample size is very small

6. FAQs and Best Practices โ€‹

6.1 Frequently Asked Questions โ€‹

Q1: What if the sample size is insufficient?

A:

  • Minimum requirement: Sample size > Number of X variables
  • Ideal situation: Sample size โ‰ฅ 3 ร— Number of X variables
  • If insufficient:
    1. Reduce X variables (prioritize excluding those with low VIP)
    2. Extend the data collection cycle
    3. Consider using PCA for dimensionality reduction first

Q2: How to handle missing values?

A:

  • Missing <5%: Impute with mean/median
  • Missing 5-20%: Impute with interpolation or regression prediction
  • Missing >20%: Consider excluding the variable or sample
  • The platform supports multiple missing value handling strategies

Q3: What if the boundary between Set/Manipulated variables and Disturbance variables is blurred?

A:

  • Judgment criterion: Can it be actively adjusted under current technology/cost conditions?
  • Example: Ambient temperature is theoretically controllable (AC), but the cost is too high โ†’ treated as a disturbance
  • Both are X in the model; the difference lies only in the optimization strategy

Q4: What if the model performs poorly on new data?

A:

  • Check if the new data is within the range of the training data (extrapolation risk)
  • Check if new disturbance factors have emerged
  • Consider model updates (incremental learning or retraining)

Q5: How to report modeling results to management?

A:

  • Avoid technical jargon, focus on business value
  • Use specific numbers: "Expected to increase yield by X% after optimization"
  • Display visualizations: Score plot, VIP plot
  • Provide clear action recommendations

6.2 Modeling Success Checklist โ€‹

markdown
## Project Delivery Checklist

### Model Quality
- [ ] QยฒY > 0.5 (Minimum threshold)
- [ ] RยฒY - QยฒY < 0.2 (Avoid overfitting)
- [ ] No obvious patterns in residuals
- [ ] VIP of key variables > 1

### Business Validation
- [ ] Key variables conform to process common sense
- [ ] Abnormal samples have reasonable explanations
- [ ] Model prediction error is within an acceptable range
- [ ] Validated with at least one independent batch of data

### Document Completeness
- [ ] Variable classification list
- [ ] Data collection method description
- [ ] Model performance report
- [ ] Application suggestions and risk warnings

6.3 Continuous Improvement Suggestions โ€‹

Model Lifecycle Management:

Months 1-2: Model Building and Validation
    โ””โ”€โ”€ Collect data, build initial model
    โ””โ”€โ”€ Internal validation, parameter tuning

Months 3-6: Trial Run and Optimization
    โ””โ”€โ”€ Small-scale trial
    โ””โ”€โ”€ Collect feedback, correct issues

Months 6-12: Official Deployment
    โ””โ”€โ”€ Full application
    โ””โ”€โ”€ Establish monitoring mechanism

After 12 Months: Regular Maintenance
    โ””โ”€โ”€ Evaluate model performance quarterly
    โ””โ”€โ”€ Data drift detection
    โ””โ”€โ”€ Retrain when necessary

Appendix: Quick Reference Cards โ€‹

Variable Classification Quick Reference โ€‹

QuestionSet/Manipulated Variable (SV/MV)Disturbance Variable (DV)Controlled Variable (CV)Process Value (PV)
Actively adjustable?โœ… Yes (mainly modifying SV)โŒ NoN/A (It's a result)N/A (It's a result)
Role in modelXXYX/Y
Optimization valueHigh (direct operation)Medium (monitoring and early warning)TargetState feedback
ExampleTemperature setpointAmbient temperatureProduct purityActual temp reading

Model Selection Quick Reference โ€‹

ScenarioRecommended ModelKey Indicators
Only X, explore structurePCARยฒX, Score Plot
Xโ†’Y Prediction (Continuous)PLSRยฒY, QยฒY, VIP
Xโ†’Y Classification (Discrete)PLS-DAAccuracy, F1, AUC

VIP Interpretation Quick Reference โ€‹

VIP ValueImportanceSuggestion
> 1.5Very ImportantFocus
1.0-1.5ImportantKeep
0.5-1.0GeneralCan keep
< 0.5UnimportantConsider excluding

This document is a companion guide for the Data Insight Platform, combining actual industrial scenarios to help users systematically conduct data modeling work.

Let data speak, make decisions simpler.