Skip to content

Key Terms Explanation

If you are new to PCA/PLS, a bunch of abbreviations will look like "天书" (heavenly script): R², Q², VIP, T², SPE...

Don't panic. This page explains the most commonly used terms in the platform in plain language: what questions they answer, how to interpret them, and when it's easy to make mistakes.

💡 Here's a "cheat sheet" for you:

  • R² (Fitting): How well the model explains the "known data".
  • Q² (Prediction): How accurate the model is for "unseen data" (obtained from cross-validation).
  • T² (In-model Anomaly): Points are in the model plane but deviate too far (mavericks).
  • SPE (Out-of-model Anomaly): Points don't fit the model plane at all (large residuals).
  • VIP / Loading (Variable Contribution): Which variables are "leading the rhythm".
  • Accuracy: How many classification questions were answered correctly (can be misleading when classes are imbalanced).
  • F1 Score: A balanced master that doesn't favor one aspect over another (considers both not wrongfully accusing good people and not letting bad people go).
  • AUC (Discrimination): The overall ability of a classification model to distinguish between positive and negative classes.

✅ R² (r squared): Goodness of Fit

🧠 What question is it answering?

R² concerns: How well the model explains the training data. You can think of it as "the model's ability to retell this homework".

Commonly seen in the platform:

  • R2X: How much information of X (features) the model explains.
  • R2Y: How much variation of Y (target) the model explains.

🧐 How to interpret the value?

  • Range: Usually between 0 and 1, higher is better.
  • Intuition: R2Y = 0.90 is approximately "90% of Y's variation can be explained by the model".
  • Note: A high R² does not mean the prediction is necessarily accurate, because it is just "remembering the training set".

⚠️ Common Misconceptions

  • Only looking at R², not Q²: Easy to "memorize by rote" (overfitting).
  • Forcing comparisons across different tasks: R² has limited comparability across regression, discrimination (classification), and different data scales. It is more recommended to compare under the same business data and modeling goal.

✅ Q² (q squared): Predictive Ability

🧠 What question is it answering?

Q² concerns: The model's predictive ability for unseen data. You can think of it as "the model's ability to generalize".

Q² in the platform generally comes from cross-validation (see next section), which is closer to what you really care about: "how well it works after deployment".

🧐 How to interpret the value?

  • Range: Usually between 0 and 1, higher is better (in some cases it may be lower or even negative, indicating poor prediction).
  • Practical experience:
    • Low Q²: Poor model generalization, usually requires reducing components, cleaning anomalies, or rechecking X/Y configuration.
    • High R² but significantly lower Q²: Typical overfitting signal.

💡 A good habit:

Treat R² as "fitness checkup" and Q² as "prediction checkup". Only when both are healthy is the model reliable.


✅ Cross Validation: The Model's "Mock Exam"

🧠 What question is it answering?

The core idea of cross-validation is simple:

  • First, "hide" some samples as exam questions;
  • Train the model with the remaining samples;
  • Then use the model to predict the "hidden part" and see how well it performs;
  • Repeat multiple times to get a more stable evaluation.

This is one of the sources of Q² in the platform, and also an important basis for the platform to automatically find the optimal number of principal components/components (see Modeling Analysis).

🧩 Common Practices (You don't need to calculate by hand)

  • K-fold Cross Validation: Divide samples into K parts, take 1 part for validation in turn.
  • Leave-One-Out (LOO): Leave only 1 sample for validation each time, common when samples are few.

⚠️ When cross-validation can go wrong

  • Too few samples: Large evaluation fluctuations, Q² will be unstable.
  • Data leakage: For example, samples from the same batch are split into training and validation, resulting in "inflated" results.
  • Extreme class imbalance (classification): Q²/Accuracy alone may be optimistic, need to combine with indicators like AUC.

✅ Fit: Let the Model "Learn" Relationships

Fit in the platform means "training the model":

  • PCA (Exploration): Learn to summarize X's main variation patterns with fewer dimensions (dimensionality reduction).
  • PLS (Prediction): Learn the relationship between X and Y, use X to explain/predict Y.

🎛️ Why are there "number of components/latent variables"?

You can think of the number of components as the "number of patterns the model is allowed to remember":

  • Too few components: Model is too simple, not enough patterns learned (underfitting).
  • Too many components: Model is too complex, even memorizes noise (overfitting).

The platform's C+1/C-1 is adjusting this "complexity knob", using Q² (cross-validation) to help you choose a more stable complexity (see Modeling Analysis).


✅ Classification vs Regression

These are two "tracks" that must be clarified before modeling. Your Y (target variable) determines which track you will run on.

🧠 Classification (Classification Task)

  • Goal: Predict a category (discrete value).
  • Examples: Judge whether a product is "qualified" or "unqualified"; judge whether a patient is "positive" or "negative".
  • What indicators to look at: Accuracy, F1 Score, AUC.
  • In the platform: If your Y column is text labels (such as 0/1), the platform will automatically use PLS-DA (Partial Least Squares Discriminant Analysis).

🧠 Regression (Continuous Prediction Task)

  • Goal: Predict a specific value (continuous value).
  • Examples: Predict whether tomorrow's temperature is 25.3℃ or 26.1℃; predict whether a product's purity is 98.5% or 99.2%.
  • What indicators to look at: R², Q², RMSE (Root Mean Squared Error).
  • In the platform: If your Y column is continuous numbers, the platform will automatically use standard PLS.

✅ Accuracy: How many classification questions were answered correctly

🧠 What question is it answering?

Accuracy = Number of correctly predicted samples / Total number of samples. For example, if 92 out of 100 samples are predicted correctly, Accuracy = 0.92.

⚠️ When can Accuracy be misleading?

When classes are extremely imbalanced, Accuracy may look "good" but have no meaning.

Example: 95 out of 100 samples are "qualified", 5 are "unqualified".

  • If you always predict "qualified", Accuracy is still 95%;
  • But this model completely misses "unqualified", which is actually the most dangerous in business.

✅ More stable combinations

  • Look at confusion matrix (TP/FP/TN/FN), recall rate, and precision rate at the same time.
  • Combine with F1 Score or ROC/AUC in the next section to see the model's overall performance under different conditions.

✅ F1 Score: The "Balanced Master"

🧠 What question is it answering?

When you find that Accuracy is misleading in "class imbalance" (such as 95 qualified, 5 unqualified), you need to bring out F1 Score. It is calculated by combining two indicators:

  • Precision: Among the "unqualified" you predicted, how many are actually unqualified? (Don't wrongfully accuse good people)
  • Recall: Among the actual "unqualified", how many did you successfully catch? (Don't let bad people go)

🧐 How to interpret?

F1 Score is the "harmonic mean" of Precision and Recall (it requires good grades in both subjects, no favoritism).

  • Range: 0 to 1, closer to 1 means stronger model.
  • Intuition: If the model always guesses blindly or only focuses on one class, its F1 Score will be pulled down by the "favored" subject's score. It forces the model to be a "balanced master".

✅ AUC Curve (ROC/AUC): The "Threshold Checkup" for Classification Models

🧠 What is ROC curve?

Classification models often output a score (or probability) of "how much it looks like the positive class". Choosing different thresholds will give different:

  • True Positive Rate (TPR, recall): The proportion of positive classes caught.
  • False Positive Rate (FPR, false alarm): The proportion of negative classes misjudged as positive.

ROC curve connects (FPR, TPR) under different thresholds to see whether the model is overall "close to the upper left corner (good)" or "close to the diagonal (average)".

🧮 What is AUC?

AUC is the area under the ROC curve, intuitive interpretation:

  • AUC = 0.5: Close to random guess.
  • AUC closer to 1: Stronger ability to distinguish between positive and negative classes.

✅ Why is AUC often more reliable than Accuracy?

  • AUC does not depend on a fixed threshold, better reflecting the model's overall discrimination ability.
  • In case of class imbalance, AUC is usually more stable than Accuracy and less likely to be "good-looking but useless".

✅ Hotelling T²: In-model Anomaly

🧠 What question is it answering?

T² measures how far a sample is from the center in the model's "main space (score space)":

  • The sample point still varies in the direction described by the model, but deviates too far;
  • Common in "some variables are particularly extreme", belonging to in-model mavericks.

🧐 How to look at it?

  • Usually there is a Limit (confidence limit) as a warning line.
  • Exceeding Limit: Key abnormal candidate points to focus on.

For more intuitive chart explanations, see: Hotelling T² Chart


✅ SPE (Squared Prediction Error): Out-of-model Anomaly

🧠 What question is it answering?

SPE is also often called DModX, it measures the "vertical distance" (residual) from the sample to the model plane:

  • Large SPE: Indicates that the variation pattern of this sample cannot be explained by the model;
  • Common in "working conditions changed, new patterns appeared, data quality anomalies", belonging to out-of-model floating samples.

🧩 How to use T² and SPE together?

  • High T², low SPE: Model can explain, but points are extreme (in-model anomaly).
  • Low T², high SPE: Direction not extreme, but model cannot explain (out-of-model anomaly).
  • Both high: Both extreme and unexplainable, usually prioritize investigation.

For more intuitive chart explanations, see: SPE Chart


✅ VIP (Variable Importance in Projection): Variable Importance Ranking

🧠 What question is it answering?

VIP is used in PLS: It tells you which X variables are most critical for explaining/predicting Y.

You can think of VIP as a ranking of "who contributes the most to the result".

🧐 How to interpret?

  • Common empirical line: VIP = 1 as a reference threshold for "important variables".
  • Higher: Indicates this variable is more likely to be a key control point.

For more intuitive chart explanations, see: VIP Variable Contribution


✅ Loading: Variables' "Positions" in the Model

🧠 What question is it answering?

Loading describes: The contribution direction and magnitude of each variable on a certain principal component/latent variable.

In short: Score plots look at samples (rows), loading plots look at variables (columns).

🧐 How to look at loading plots?

  • Farther from the origin: This variable has a greater impact on this component.
  • Two variables in the same direction and close: Often represent similar information (positive correlation).
  • Two variables in opposite directions and on opposite sides: Often represent trade-offs (negative correlation).

💡 Small reminder:

The interpretation of loadings/scores is highly dependent on whether the data is standardized and how many components you choose. It is recommended to view conclusions together with Model Summary and business knowledge.

For more intuitive chart explanations, see: Loading Plot


📌 Final Cheat Sheet: What Each One "Catches"

  • : How well it explains/fits overall (training set).
  • : How accurate the overall prediction is (cross-validation).
  • Accuracy: How many predictions are correct (beware of deception).
  • F1 Score: The balance master of precision and recall.
  • AUC: The model's hard power in distinguishing positive and negative classes.
  • : In-model deviation (running too far along the model direction).
  • SPE: Out-of-model deviation (large residuals, model cannot explain).
  • VIP: Ranking of X variables that contribute greatly to Y (PLS).
  • Loading: Variables' contribution and correlation structure on components (PCA/PLS).

If you encounter situations like "high R² but low Q²" or "both T² and SPE exceed limits" in actual analysis, it is usually not your operation error, but a hint from the data: Points to clean, variables to reconfigure, and complexity to reduce, none can escape.

Let data speak, make decisions simpler.