Data visualization is the graphical representation of information and data using visual elements such as charts, graphs, maps, and dashboards. It transforms raw data into an accessible format that helps identify trends, outliers, and patterns quickly.
Tools used for data visualization include SQL, Python (matplotlib, seaborn, plotly), and R (ggplot2).
| Chart Type | Best Used For | Example Tool (Python) |
|---|---|---|
| Bar Chart | Comparing discrete categories | plt.bar() |
| Histogram | Distribution of continuous data | plt.hist() |
| Line Graph | Trends over time | plt.plot() |
| Scatter Plot | Relationship between two numeric variables | plt.scatter() |
| Pie Chart | Proportions of a whole | plt.pie() |
| Heat Map | Magnitude across a matrix or geography | sns.heatmap() |
| Box Plot | Distribution, median, and outliers | sns.boxplot() |
import matplotlib.pyplot as plt
# Bar Chart
categories = ['Math', 'Science', 'English']
scores = [85, 92, 78]
plt.bar(categories, scores, color='steelblue')
plt.title('Student Scores by Subject')
plt.xlabel('Subject')
plt.ylabel('Score')
plt.show()
# Scatter Plot
import matplotlib.pyplot as plt
height = [150, 160, 170, 180, 190]
weight = [50, 60, 70, 80, 90]
plt.scatter(height, weight, color='red')
plt.title('Height vs Weight')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.show()
import seaborn as sns
import pandas as pd
# Heat Map (Correlation Matrix)
df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6], 'C': [7,8,9]})
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
SQL itself does not render charts, but query results can be exported to visualization tools. Some database environments (e.g., MySQL Workbench, PostgreSQL with Grafana) support basic chart generation.
-- Example: Aggregate data for visualization
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY employee_count DESC;
This result can then be fed into Python or R to create a bar chart.
# Using ggplot2
library(ggplot2)
df <- data.frame(subject = c('Math','Science','English'), score = c(85,92,78))
ggplot(df, aes(x=subject, y=score, fill=subject)) +
geom_bar(stat='identity') +
ggtitle('Student Scores by Subject')
A Heat Map uses color intensity to represent values across a data matrix or geographical area.
A Dashboard is an interactive information management tool that:
Popular dashboard tools include Tableau, Power BI, and Python's Dash library.
Visualizations play a key role in communicating hypothesis testing results: