📊 Data Analysis Terminology
Essential Terms for Data-Driven Decision Making
🎯 Understanding Data Analysis
What is Data Analysis?
Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. In business, it's the key to understanding customers, optimizing operations, and driving growth.
- Informed Decisions: Move from gut feelings to data-backed insights
- Identify Trends: Spot patterns and predict future outcomes
- Measure Performance: Track KPIs and progress toward goals
- Optimize Resources: Allocate budget and efforts efficiently
- Competitive Advantage: Outperform competitors with better insights
📈 Types of Analysis
- Descriptive: What happened?
- Diagnostic: Why did it happen?
- Predictive: What will happen?
- Prescriptive: What should we do?
🔧 Common Tools
- Excel / Google Sheets
- SQL databases
- Python (Pandas, NumPy)
- R programming
- Tableau / Power BI
💼 Business Applications
- Sales forecasting
- Customer segmentation
- Marketing ROI
- Inventory optimization
- Risk assessment
📚 Fundamental Terms
Core Concepts
Data
Raw facts, figures, or information collected for analysis. Data can be numbers, text, images, or any observable information that can be measured or described.
Basic FoundationDataset
A collection of related data organized in a structured format, typically in tables with rows and columns. Each row represents a record, and each column represents a variable or attribute.
BasicVariable
A characteristic, number, or quantity that can be measured or counted. Variables can change or vary across observations.
Basic- Age, income, temperature
- Number of purchases
- Sales revenue
- Gender, color, category
- Customer satisfaction
- Product type
Metric
A quantifiable measure used to track and assess performance or progress. Metrics are specific numbers that represent business activities.
Basic KPI- Monthly Revenue: $250,000
- Customer Acquisition Cost: $45
- Website Traffic: 50,000 visitors/month
- Conversion Rate: 3.5%
- Churn Rate: 2.1%
KPI (Key Performance Indicator)
A critical metric that directly measures progress toward strategic business objectives. KPIs are the most important metrics that leadership tracks regularly.
Basic Business- Sales: Monthly Recurring Revenue (MRR), Sales Growth Rate
- Marketing: Customer Acquisition Cost (CAC), Return on Ad Spend (ROAS)
- Customer Success: Net Promoter Score (NPS), Customer Lifetime Value (CLV)
- Operations: Order Fulfillment Time, Inventory Turnover
Dashboard
A visual display of key metrics and data points, typically showing real-time or near-real-time information. Dashboards provide at-a-glance views of business performance.
Basic Visualization📐 Statistical Concepts
Measures of Central Tendency
Mean (Average)
The sum of all values divided by the number of values. The most commonly used measure of central tendency.
Basic StatisticsMean = ($1,000 + $1,200 + $900 + $1,500 + $1,400) / 5 = $1,200
Median
The middle value when data is arranged in order. If there's an even number of values, it's the average of the two middle values. Less affected by extreme values (outliers) than the mean.
BasicMedian = $50K (middle value)
Mean = $78K (skewed by the $200K outlier)
Mode
The value that appears most frequently in a dataset. A dataset can have one mode, multiple modes, or no mode.
BasicMode = 5 (appears 4 times, most frequent)
Measures of Spread
Range
The difference between the highest and lowest values in a dataset. Shows the spread of the data.
BasicRange = 2,000 - 1,200 = 800 visitors
Standard Deviation
A measure of how spread out numbers are from the mean. A low standard deviation means data points are close to the mean; a high standard deviation means they're spread out over a wider range.
Intermediate StatisticsStore B daily sales: $500, $1,500, $800, $1,200, $1,000 (SD = $365 - volatile)
Variance
The average of the squared differences from the mean. Variance is the square of standard deviation. Used to measure variability.
IntermediatePercentile
A value below which a certain percentage of observations fall. The 75th percentile means 75% of values are below this point.
IntermediateQuartile
Values that divide data into four equal parts. Q1 (25th percentile), Q2 (50th percentile/median), Q3 (75th percentile).
IntermediateQ1: $50 (bottom 25% spend ≤ $50)
Q2: $100 (median, 50% spend ≤ $100)
Q3: $200 (top 25% spend > $200)
🗄️ Data Types & Structures
Structured Data
Highly organized data that fits neatly into tables with rows and columns. Easily searchable and analyzable. Stored in relational databases.
Basic- Excel spreadsheets, SQL databases
- Customer data (ID, Name, Email, Age, City)
- Transaction records (Date, Amount, Product, Customer)
- Inventory systems (SKU, Quantity, Price, Location)
Unstructured Data
Data that doesn't fit into traditional row-column structure. Includes text, images, videos, audio files. Harder to analyze but contains valuable insights.
Intermediate- Customer reviews and feedback
- Social media posts and comments
- Email content
- Images, videos, audio recordings
- PDF documents, Word files
Semi-Structured Data
Data that doesn't fit perfectly into tables but has some organizational structure like tags or markers. Often in formats like JSON or XML.
IntermediateCategorical Data
Data that can be grouped into categories. Values are labels or names, not numbers (though they may be coded as numbers).
Basic- Gender: Male, Female
- Color: Red, Blue, Green
- Product Category: Electronics, Clothing
- Education: High School, Bachelor's, Master's
- Rating: Poor, Fair, Good, Excellent
- Size: Small, Medium, Large
Numerical Data
Data expressed in numbers that can be measured or counted. Can perform mathematical operations on numerical data.
Basic- Number of customers: 1, 2, 3...
- Items sold: 50, 51, 52...
- Employee count: 25, 26, 27...
- Temperature: 72.5°F
- Revenue: $1,234.56
- Time: 3.25 hours
Time Series Data
Data points indexed in time order. Used to track changes over time and identify trends, patterns, and seasonality.
Intermediate Forecasting- Daily sales revenue over a year
- Hourly website traffic
- Monthly customer acquisition
- Stock prices by minute
🔬 Analysis Techniques
Descriptive Analytics
Summarizes historical data to understand what happened in the past. Uses aggregation, data mining, and visualization to describe patterns.
Basic Business Intelligence- What was our total revenue last quarter?
- How many customers did we acquire last month?
- What were our top-selling products?
- What is our average order value?
Diagnostic Analytics
Examines data to understand why something happened. Involves drilling down into data, finding correlations, and identifying causal relationships.
Intermediate- Why did sales drop in Q3?
- What caused the spike in customer churn?
- Why did conversion rates improve?
- What factors contribute to high customer lifetime value?
Predictive Analytics
Uses historical data, statistical algorithms, and machine learning to forecast future outcomes. Identifies the likelihood of future events.
Advanced Machine Learning- What will our revenue be next quarter?
- Which customers are likely to churn?
- What's the expected demand for this product?
- Which leads are most likely to convert?
Prescriptive Analytics
Recommends actions to take based on predictions. Uses optimization and simulation algorithms to suggest the best course of action.
Advanced- What should we do to maximize revenue?
- How should we allocate our marketing budget?
- What pricing strategy will optimize profit?
- Which inventory levels should we maintain?
Correlation
A statistical measure that describes the relationship between two variables. When one variable changes, the other tends to change in a predictable way. Measured from -1 to +1.
Intermediate Statistics- As advertising spend ↑, sales ↑
- As temperature ↑, ice cream sales ↑
- As price ↑, demand ↓
- As distance ↑, delivery speed ↓
Regression Analysis
A statistical method to model the relationship between a dependent variable and one or more independent variables. Used for prediction and understanding relationships.
AdvancedPredict sales based on advertising spend, season, and pricing:
Sales = 10,000 + (5 × Ad Spend) + (2,000 × Summer) - (100 × Price)
Segmentation
Dividing a population or dataset into distinct groups based on shared characteristics. Each segment contains similar members, while different segments are dissimilar.
Intermediate Marketing- High-Value Loyalists: Frequent buyers, high spend, long tenure
- Bargain Hunters: Purchase only during sales, price-sensitive
- New Explorers: Recent sign-ups, low purchase history
- At-Risk: Declining engagement, haven't purchased recently
A/B Testing
A method of comparing two versions (A and B) to determine which performs better. Randomly split your audience and measure which version achieves better results.
Intermediate Optimization- Version A (Control): Blue "Buy Now" button → 2.5% conversion
- Version B (Test): Red "Buy Now" button → 3.2% conversion
- Result: Red button wins with 28% higher conversion rate
💼 Key Business Metrics
Revenue & Growth Metrics
Revenue
The total amount of money generated from sales of goods or services. Also called top-line or gross revenue.
BasicMRR (Monthly Recurring Revenue)
Predictable revenue generated from subscription-based customers each month. Key metric for SaaS and subscription businesses.
Basic SaaSARR (Annual Recurring Revenue)
The annualized version of MRR. Total predictable revenue generated in a year from subscriptions.
BasicGrowth Rate
The percentage increase or decrease in a metric over time. Shows the pace of business expansion or contraction.
BasicGrowth Rate = (($120K - $100K) / $100K) × 100% = 20% growth
Customer Metrics
CAC (Customer Acquisition Cost)
The total cost of acquiring a new customer, including all marketing and sales expenses divided by the number of customers acquired.
Intermediate MarketingCAC = $10,000 / 200 = $50 per customer
CLV/LTV (Customer Lifetime Value)
The total revenue a business expects to earn from a customer throughout their entire relationship. Critical for determining how much to invest in acquisition.
IntermediateCLV = $100 × 24 = $2,400
Churn Rate
The percentage of customers who stop using your product or service during a given time period. Lower is better.
IntermediateMonthly Churn = (50 / 1,000) × 100% = 5% churn rate
Retention Rate
The percentage of customers who continue using your product over time. The opposite of churn rate.
IntermediateRetention = ((950 - 100) / 1,000) × 100% = 85% retention
NPS (Net Promoter Score)
A customer satisfaction metric based on one question: "How likely are you to recommend us to a friend?" Scored from -100 to +100.
Intermediate Customer Success- Promoters (9-10): Loyal, enthusiastic customers
- Passives (7-8): Satisfied but unenthusiastic
- Detractors (0-6): Unhappy, may spread negative word-of-mouth
Example: 60% promoters, 10% detractors = NPS of 50 (excellent!)
Conversion & Efficiency Metrics
Conversion Rate
The percentage of users who complete a desired action out of the total number of visitors or users.
Basic MarketingConversion Rate = (30 / 1,000) × 100% = 3%
ROI (Return on Investment)
A measure of profitability that shows the gain or loss from an investment relative to its cost. Expressed as a percentage.
BasicROI = (($4,000 - $1,000) / $1,000) × 100% = 300% ROI
ROAS (Return on Ad Spend)
The revenue generated for every dollar spent on advertising. Specifically measures advertising effectiveness.
Intermediate AdvertisingROAS = $2,500 / $500 = 5:1 (or 500%)
Meaning: Every $1 spent returns $5 in revenue
Bounce Rate
The percentage of visitors who leave a website after viewing only one page without taking any action. High bounce rate may indicate poor user experience.
Basic Web AnalyticsBounce Rate = (400 / 1,000) × 100% = 40%
🔧 Data Quality & Processing
Data Cleaning
The process of detecting and correcting (or removing) corrupt, inaccurate, or irrelevant data. Essential first step in any analysis.
Intermediate- Remove duplicate records
- Fix typos and inconsistent formatting
- Handle missing values
- Remove outliers (if appropriate)
- Standardize data formats (dates, currency)
- Correct data entry errors
Missing Data
Data points that are absent from a dataset. Can occur due to errors, non-responses, or data loss. Must be handled appropriately in analysis.
Intermediate- Deletion: Remove rows with missing data (if few)
- Imputation: Fill with mean, median, or predicted values
- Flag: Create indicator variable for missing values
- Forward Fill: Use previous value (time series)
Outlier
A data point that differs significantly from other observations. Can be due to variability, errors, or truly exceptional cases.
IntermediateInvestigation needed: Is this a data entry error (extra zero?) or a real exceptional sale?
Normalization
Scaling data to fit within a specific range (typically 0 to 1). Makes variables with different units comparable.
AdvancedAfter normalization, both range from 0 to 1, making them comparable in analysis.
Data Aggregation
Combining multiple data points into summary statistics. Reduces granularity while preserving important information.
BasicJan 1: $1,000, Jan 2: $1,200, ... Jan 31: $1,500
→ January Total: $35,000
ETL (Extract, Transform, Load)
The process of extracting data from sources, transforming it into a usable format, and loading it into a destination system (data warehouse).
Advanced Data Engineering- Extract: Pull data from CRM, website, sales system
- Transform: Clean, standardize, aggregate data
- Load: Store in data warehouse for analysis
📈 Data Visualization
Data Visualization
The graphical representation of data and information using visual elements like charts, graphs, and maps. Makes complex data easier to understand.
Basic- Quickly identify patterns and trends
- Communicate insights effectively
- Support data-driven decisions
- Spot outliers and anomalies
Bar Chart
Uses rectangular bars to compare values across categories. Bar length represents the value. Great for comparing discrete categories.
BasicExample: Monthly sales comparison across different stores
Line Chart
Displays data points connected by lines, showing trends over time. Perfect for visualizing continuous data and time series.
BasicExample: Daily website visitors over the past 3 months
Pie Chart
A circular chart divided into slices showing proportions or percentages of a whole. Each slice represents a category's contribution.
BasicExample: Revenue breakdown by product line (Software: 45%, Hardware: 30%, Services: 25%)
Scatter Plot
Displays values for two variables as points on a grid. Shows relationships, patterns, and correlations between variables.
IntermediateExample: Advertising spend (x-axis) vs. Sales (y-axis) to see correlation
Heatmap
Uses color intensity to represent data values in a matrix format. Darker/brighter colors indicate higher/lower values.
IntermediateExample: Website traffic by hour of day and day of week (darker = more traffic)
Histogram
Shows the distribution of numerical data by grouping values into ranges (bins). Each bar represents frequency within that range.
IntermediateExample: Distribution of customer ages (bin: 18-25, 26-35, 36-45, etc.)
Funnel Chart
Visualizes stages in a process showing progressive reduction. Each stage is narrower than the previous, representing drop-off.
Intermediate ConversionExample - Sales Funnel:
- Website Visitors: 10,000
- Product Page Views: 3,000 (30%)
- Add to Cart: 600 (6%)
- Checkout: 400 (4%)
- Purchase: 300 (3%)
⚡ Quick Reference Guide
Common Data Analysis Workflow
1. Define Objectives
What questions do you want to answer? What decisions will this inform?
2. Collect Data
Gather relevant data from databases, APIs, spreadsheets, or surveys
3. Clean Data
Remove duplicates, handle missing values, fix errors, standardize formats
4. Explore Data
Calculate summary statistics, create visualizations, identify patterns
5. Analyze
Apply statistical methods, build models, test hypotheses
6. Interpret Results
Draw conclusions, extract insights, answer original questions
7. Communicate Findings
Create dashboards, reports, presentations for stakeholders
8. Take Action
Implement data-driven decisions, monitor outcomes, iterate
Chart Selection Guide
| Your Goal | Best Chart Type | Example Use Case |
|---|---|---|
| Compare categories | Bar Chart | Sales by product category |
| Show trends over time | Line Chart | Monthly revenue growth |
| Show parts of whole | Pie Chart | Market share distribution |
| Show relationship | Scatter Plot | Ad spend vs. revenue |
| Show distribution | Histogram | Customer age distribution |
| Show process flow | Funnel Chart | Conversion funnel |
| Show patterns in matrix | Heatmap | Traffic by time/day |
Essential Formulas Cheat Sheet
((New - Old) / Old) × 100%
(Conversions / Visitors) × 100%
Marketing Cost / New Customers
((Revenue - Cost) / Cost) × 100%
(Lost Customers / Total) × 100%
Avg Purchase × Frequency × Lifespan
Revenue from Ads / Ad Spend
Customers × Avg Revenue/Customer
© 2024 AiPro Institute | Member Exclusive Resource
Master Data Analysis • Make Informed Decisions • Drive Business Growth