Step 1: Data Preparation
The first step in data analysis is like preparing ingredients before cooking.
📥 Drag in the Data
Open the software, and you'll see a clean welcome interface.
Prepare the file: Find your Excel (
.xlsx) file.🔗 Regression Example Data: Click to download Steel_demo.xlsx
🔗 Classification Example Data: Click to download Wine_demo.xlsx
One-click import: Simply drag the file into the blue dashed box, or click the button to select.
💡 Tip: It's best if the first row is the point location and the second row is the column headers (such as "Temperature", "Pressure", "Yield Rate"), so the software can understand what each column represents.
⚙️ Tell the Software How to Read
After uploading the file, we need to give it a home first, and then briefly "introduce" this data.
1. Save Instance and Create Model
- Save instance: After successful upload, click save, and the system will automatically set it as the current working instance.
- Create model: Click "Add New Model" to enter model creation, and give your first model a name.
- Select data table: If your Excel has multiple Sheets, remember to select the one containing the data.
2. Who is the Name? Who is the Data?
- Header Row: Use the "Set Header Row" button to tell the software which row is the column name. Usually it's row 1.
- Name Row: Use the "Set Name Row" button to tell the software which row is the name column. Usually it's row 2.
- Data Pre-cleaning: After completing the above settings, click any column to link with the pre-analysis chart on the right.
- Data Overview: Intuitively view mean, standard deviation (Std), number of valid samples, and null value statistics.
- Quick Imputation: For null values, support direct filling with mean value.
- Outlier Removal: Combine the trend chart and distribution chart below, use the lasso tool to select outliers, and quickly delete entire rows or columns through the buttons on the right side of the row numbers.
3. Assign Roles
This is like assigning script roles to actors:
- Observation Column: This is the "ID number" for each sample, such as timestamp or batch number. Select it!
- X Columns (Features): These are factors that affect the result, such as raw material ratio, reaction temperature. Check all of them.
- Y Column (Target): This is the result you want to predict, such as product purity. If you are making a prediction model, be sure to select it.
- Dataset Partitioning:
- Test Set Proportion: Leave some data (such as 20%) for testing to verify model accuracy.
- Random Split: Whether to shuffle the data order before partitioning. If it's time-series data (in chronological order), it's recommended not to check this.
After configuration, click "Confirm Configuration" and the data is ready! ✅