Data Visualisatie Python: Complete Gids met Matplotlib & Seaborn
Leer professionele data visualisatie in Python. Complete tutorial van basis plots tot interactieve dashboards met Matplotlib, Seaborn en Plotly.
Zoek je Data Visualization experts?
Vind gespecialiseerde Data Analysts en BI Developers voor je visualisatie projecten
Inhoudsopgave
- Waarom data visualisatie belangrijk is
- Python visualisatie bibliotheken overzicht
- Matplotlib: Basis plots en customizatie
- Seaborn: Statistische en gevorderde plots
- Plotly: Interactieve visualisaties
- Welke chart voor welk doel?
- Best practices en design principes
- Dashboard creëren met Plotly Dash
- Praktijkvoorbeeld: Sales dashboard
1. Waarom data visualisatie belangrijk is
Wat is data visualisatie?
Data visualisatie is het grafisch weergeven van informatie en data met behulp van visuele elementen zoals grafieken, kaarten en infographics. Het helpt om patronen, trends en inzichten te ontdekken die in ruwe data verborgen blijven.
Snellere inzichten
Het menselijk brein verwerkt visuele informatie 60.000x sneller dan tekst.
Complexiteit vereenvoudigen
Maak complexe data begrijpelijk voor niet-technische stakeholders.
Trends identificeren
Spot patronen en correlaties die anders verborgen blijven.
Effectief communiceren
Verhaal vertellen met data voor betere besluitvorming.
Voorbeelden van impact
- Business intelligence: Sales trends, KPI dashboards, management rapportages
- Data science: Model performance, feature importance, clustering visualisatie
- Financiële analyse: Portfolio performance, markttrends, risico analyse
- Operational efficiency: Process flows, performance metrics, real-time monitoring
- Customer analytics: Segmentatie, gedragspatronen, funnel analysis
| Zonder Visualisatie | Met Visualisatie | Impact |
|---|---|---|
| Excel tabellen met 1000+ rijen | Interactief dashboard met trends | 70% snellere besluitvorming |
| Technische rapporten lezen | Eén beeld zegt meer dan 1000 woorden | Betere stakeholder begrip |
| Handmatige data analyse | Automatische anomaly detection | Proactief in plaats van reactief |
| Statische PDF rapporten | Interactieve web dashboards | Real-time beslissingen mogelijk |
Team nodig voor data dashboards?
Vind ervaren Data Analysts en BI Developers gespecialiseerd in data visualisatie
2. Python visualisatie bibliotheken overzicht
Matplotlib
Basis library voor alle plots. Uitgebreide customizatie mogelijkheden.
- Niveau: Laag (veel controle)
- Geschikt voor: Publicatie-kwaliteit plots
- Learning curve: Steil
Seaborn
Hoog-niveau interface op Matplotlib. Statistisch georiënteerd.
- Niveau: Hoog (minder code)
- Geschikt voor: Data exploratie
- Learning curve: Mild
Plotly
Interactieve web-based visualisaties. Moderne uitstraling.
- Niveau: Hoog
- Geschikt voor: Dashboards, web apps
- Learning curve: Mild
Installatie en import
# Basis installatie
pip install matplotlib seaborn plotly pandas numpy
# Voor Plotly Dash (dashboards)
pip install dash
# Import statements
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Stijl instellen (optioneel maar aanbevolen)
plt.style.use('seaborn-v0_8-whitegrid') # Schone grid stijl
sns.set_style("whitegrid") # Seaborn stijl
sns.set_palette("husl") # Kleurenpalet
# Voor Jupyter notebooks
%matplotlib inline
# Of voor interactieve plots
%matplotlib widget
3. Matplotlib: Basis plots en customizatie
Basis line plot
# Eenvoudige line plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.figure(figsize=(10, 6)) # Figuur grootte
plt.plot(x, y, label='Line 1', color='blue', linewidth=2, marker='o')
# Labels en titel
plt.xlabel('X As Label', fontsize=12)
plt.ylabel('Y As Label', fontsize=12)
plt.title('Eenvoudige Line Plot', fontsize=14, fontweight='bold')
# Legend en grid
plt.legend()
plt.grid(True, alpha=0.3)
# Opslaan en tonen
plt.savefig('line_plot.png', dpi=300, bbox_inches='tight')
plt.show()
Line Plot Voorbeeld
Meerdere plots in één figuur
# Data voorbereiden
np.random.seed(42)
x = np.arange(0, 10, 0.1)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.sin(x) * np.cos(x)
# Subplots (2 rijen, 2 kolommen)
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
# Plot 1: Line plot
axes[0, 0].plot(x, y1, color='red', label='sin(x)')
axes[0, 0].set_title('Sine Wave')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
# Plot 2: Scatter plot
axes[0, 1].scatter(x, y2, color='green', alpha=0.6, label='cos(x)')
axes[0, 1].set_title('Cosine Scatter')
axes[0, 1].legend()
# Plot 3: Bar plot
axes[1, 0].bar(x[::10], y3[::10], color='blue', alpha=0.7)
axes[1, 0].set_title('Bar Chart')
axes[1, 0].set_xlabel('X values')
# Plot 4: Histogram
axes[1, 1].hist(y3, bins=30, color='purple', alpha=0.7, edgecolor='black')
axes[1, 1].set_title('Histogram')
axes[1, 1].set_xlabel('Value')
axes[1, 1].set_ylabel('Frequency')
# Algemene titel en layout optimaliseren
plt.suptitle('Verschillende Plot Types', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
Professionele customizatie
# Professionele plot met uitgebreide customizatie
fig, ax = plt.subplots(figsize=(12, 7))
# Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales_2023 = [120, 135, 150, 165, 180, 210]
sales_2024 = [130, 140, 160, 175, 195, 225]
# Plot lines met styling
line1 = ax.plot(months, sales_2023, label='2023 Sales',
color='#3498db', linewidth=3, marker='o',
markersize=8, markerfacecolor='white',
markeredgewidth=2, markeredgecolor='#3498db')
line2 = ax.plot(months, sales_2024, label='2024 Sales',
color='#e74c3c', linewidth=3, marker='s',
markersize=8, markerfacecolor='white',
markeredgewidth=2, markeredgecolor='#e74c3c')
# Fill between (area onder de lijn)
ax.fill_between(months, sales_2023, sales_2024,
where=(np.array(sales_2024) > np.array(sales_2023)),
color='green', alpha=0.2, label='Growth')
# Labels en titel
ax.set_xlabel('Month', fontsize=12, fontweight='bold')
ax.set_ylabel('Sales (€1000)', fontsize=12, fontweight='bold')
ax.set_title('Monthly Sales Comparison: 2023 vs 2024',
fontsize=14, fontweight='bold', pad=20)
# Grid en spines
ax.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.7)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Legend
ax.legend(loc='upper left', frameon=True, fancybox=True,
shadow=True, fontsize=10)
# Annotaties
for i, (v1, v2) in enumerate(zip(sales_2023, sales_2024)):
growth = ((v2 - v1) / v1) * 100
if growth > 0:
ax.annotate(f'+{growth:.1f}%',
xy=(i, v2),
xytext=(0, 10),
textcoords='offset points',
ha='center',
color='green',
fontweight='bold')
# Y-lim instellen
ax.set_ylim(100, 250)
# Tight layout en opslaan
plt.tight_layout()
plt.savefig('professional_sales_plot.png', dpi=300,
bbox_inches='tight', transparent=False)
plt.show()
4. Seaborn: Statistische en gevorderde plots
Waarom Seaborn gebruiken?
- Minder code: Complexe plots met 1-2 regels code
- Statistische integratie: Automatische error bars, regressie lijnen
- Mooie defaults: Professionele kleuren en styling
- Categorical data: Perfect voor grouped en faceted plots
Distribution plots
# Data voorbereiden
np.random.seed(42)
data = pd.DataFrame({
'Category': np.repeat(['A', 'B', 'C'], 100),
'Value': np.concatenate([
np.random.normal(50, 10, 100),
np.random.normal(60, 15, 100),
np.random.normal(55, 12, 100)
])
})
# Subplot grid
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Histogram met KDE
sns.histplot(data=data, x='Value', kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Histogram met KDE')
# 2. Box plot
sns.boxplot(data=data, x='Category', y='Value', ax=axes[0, 1])
axes[0, 1].set_title('Box Plot per Categorie')
# 3. Violin plot
sns.violinplot(data=data, x='Category', y='Value', ax=axes[1, 0])
axes[1, 0].set_title('Violin Plot')
# 4. ECDF plot
sns.ecdfplot(data=data, x='Value', hue='Category', ax=axes[1, 1])
axes[1, 1].set_title('Empirical CDF')
plt.suptitle('Distribution Analysis met Seaborn', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
Relational en categorical plots
# Complexe dataset
tips = sns.load_dataset("tips") # Ingebouwd Seaborn dataset
# Subplots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# 1. Scatter plot met regressie
sns.regplot(data=tips, x='total_bill', y='tip', ax=axes[0, 0])
axes[0, 0].set_title('Scatter met Linear Regressie')
# 2. Categorical bar plot
sns.barplot(data=tips, x='day', y='total_bill', hue='sex',
errorbar='sd', ax=axes[0, 1])
axes[0, 1].set_title('Bar Plot met Error Bars')
# 3. Heatmap (correlatie matrix)
numeric_cols = tips.select_dtypes(include=[np.number]).columns
corr_matrix = tips[numeric_cols].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm',
center=0, ax=axes[1, 0])
axes[1, 0].set_title('Correlatie Heatmap')
# 4. Pair plot (scatter matrix)
// Pair plot werkt niet goed in subplots, dus apart tonen
pair_plot = sns.pairplot(tips, hue='sex', diag_kind='kde')
pair_plot.fig.suptitle('Pair Plot van Tips Dataset', y=1.02)
plt.tight_layout()
plt.show()
FacetGrid en complexe visualisaties
# Maak een complexe dataset
np.random.seed(42)
n_samples = 500
complex_data = pd.DataFrame({
'x': np.random.randn(n_samples),
'y': np.random.randn(n_samples) * 2 + 1,
'category': np.random.choice(['A', 'B', 'C'], n_samples),
'size': np.random.uniform(10, 200, n_samples),
'value': np.random.exponential(2, n_samples)
})
# FacetGrid: Meerdere subplots gebaseerd op categorische variabelen
g = sns.FacetGrid(complex_data, col='category', height=4, aspect=1.2)
g.map(sns.scatterplot, 'x', 'y', s='size', alpha=0.6)
g.set_titles("{col_name}")
g.set_axis_labels("X Waarde", "Y Waarde")
plt.suptitle('FacetGrid: Scatter Plots per Categorie', y=1.02)
plt.show()
# Joint plot: scatter met marginale distributies
joint = sns.jointplot(data=complex_data, x='x', y='y',
hue='category', kind='scatter',
height=7, ratio=5, space=0.2)
joint.fig.suptitle('Joint Plot: Scatter met Marginale Distributies', y=1.02)
# Relational plot met meerdere variabelen
fig, ax = plt.subplots(figsize=(10, 6))
scatter = sns.scatterplot(data=complex_data, x='x', y='y',
hue='category', size='size',
sizes=(20, 200), alpha=0.7,
palette='viridis')
# Legenda aanpassen
scatter.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.title('Multi-variable Scatter Plot', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
Klaar voor dashboard projecten?
Vind de juiste experts of plaats je Data Visualization vacature
5. Plotly: Interactieve visualisaties
Waarom Plotly gebruiken?
- Interactief: Hover, zoom, pan, selectie
- Web-ready: Exporteer naar HTML, embed in websites
- Animatie: Animated plots en transitions
- 3D visualisaties: Surface plots, 3D scatter
- Dash integration: Bouw complete web apps
Basis interactieve plots
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Sample data
df = px.data.gapminder().query("year == 2007")
# 1. Scatter plot met Plotly Express (simpele syntax)
fig1 = px.scatter(df, x="gdpPercap", y="lifeExp",
size="pop", color="continent",
hover_name="country",
size_max=60,
title="GDP per Capita vs Life Expectancy (2007)")
fig1.show()
# 2. Bar plot
fig2 = px.bar(df, x="continent", y="pop",
color="continent",
title="Population per Continent (2007)")
fig2.update_layout(xaxis_title="Continent",
yaxis_title="Population")
fig2.show()
# 3. Animated plot (over tijd)
df_all = px.data.gapminder()
fig3 = px.scatter(df_all, x="gdpPercap", y="lifeExp",
animation_frame="year",
animation_group="country",
size="pop", color="continent",
hover_name="country",
size_max=45,
range_x=[100, 100000],
range_y=[25, 90],
title="Life Expectancy vs GDP Over Time")
fig3.show()
Geavanceerde Plotly Graph Objects
# Complexe visualisatie met Graph Objects
fig = make_subplots(
rows=2, cols=2,
subplot_titles=('Line Plot', 'Bar Plot', 'Scatter Plot', '3D Surface'),
specs=[[{"type": "xy"}, {"type": "xy"}],
[{"type": "xy"}, {"type": "scene"}]]
)
# Data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# 1. Line plot
fig.add_trace(
go.Scatter(x=x, y=y1, mode='lines', name='sin(x)',
line=dict(color='blue', width=2)),
row=1, col=1
)
# 2. Bar plot
fig.add_trace(
go.Bar(x=['A', 'B', 'C', 'D'],
y=[10, 20, 15, 25],
name='Bar Data',
marker_color=['red', 'green', 'blue', 'orange']),
row=1, col=2
)
# 3. Scatter plot
fig.add_trace(
go.Scatter(x=x, y=y2, mode='markers', name='cos(x)',
marker=dict(size=8, color='green',
line=dict(width=1, color='darkgreen'))),
row=2, col=1
)
# 4. 3D Surface plot
X = np.linspace(-5, 5, 50)
Y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(X, Y)
Z = np.sin(np.sqrt(X**2 + Y**2))
fig.add_trace(
go.Surface(z=Z, x=X, y=Y, colorscale='Viridis'),
row=2, col=2
)
# Layout aanpassen
fig.update_layout(
title_text="Interactive Dashboard met Plotly",
height=800,
showlegend=True,
hovermode='closest'
)
# Update axes labels
fig.update_xaxes(title_text="X Values", row=1, col=1)
fig.update_yaxes(title_text="sin(x)", row=1, col=1)
fig.update_xaxes(title_text="Categories", row=1, col=2)
fig.update_yaxes(title_text="Values", row=1, col=2)
fig.show()
# Opslaan als HTML (interactief in browser)
fig.write_html("interactive_dashboard.html")
6. Welke chart voor welk doel?
Line Chart
Tijdreeksen, trends
Bar Chart
Categorie vergelijking
Pie Chart
Proporties, percentages
Scatter Plot
Correlaties, outliers
Histogram
Distributies
Box Plot
Statistische spread
Heatmap
Dichtheid, intensiteit
Network Graph
Relaties, connecties
| Doel | Aanbevolen Chart | Python Code | Best Voor |
|---|---|---|---|
| Tijdreeksen tonen | Line Plot | plt.plot() of px.line() |
Sales trends, stock prices |
| Categorieën vergelijken | Bar Chart | plt.bar() of px.bar() |
Product performance, regional sales |
| Distributies tonen | Histogram | plt.hist() of px.histogram() |
Customer age, income distribution |
| Correlaties vinden | Scatter Plot | plt.scatter() of px.scatter() |
Marketing spend vs revenue |
| Proporties tonen | Pie/Doughnut Chart | plt.pie() of px.pie() |
Market share, budget allocation |
| Geografische data | Choropleth Map | px.choropleth() |
Regional sales, demographic data |
Chart selectie regels
- Time-series data: Altijd line charts (nooit bar charts)
- Meer dan 5 categorieën: Bar charts i.p.v. pie charts
- 3D effecten: Vermijden! Ze zijn misleidend
- Kleur gebruik: Consistent, betekenisvol, toegankelijk
- Data-ink ratio maximaliseren: Minimale ink, maximale data
7. Best practices en design principes
Kleurgebruik
- Use colorblind-friendly palettes
- Consistent kleuren voor dezelfde categorieën
- Gradient voor numerieke data
- Maximaal 6-8 kleuren per chart
Typografie
- Sans-serif fonts voor digitale weergave
- Duidelijke hiërarchie in tekstgrootte
- Voldoende contrast met achtergrond
- Consistente labeling
Layout
- White space voor leesbaarheid
- Alignment van elementen
- Consistente schaal en aspect ratio
- Grid system voor meerdere charts
Data integriteit
- Nooit de y-as bij 0 weglaten
- Duidelijke bronvermelding
- Contextuele annotaties
- Transparantie over data beperkingen
# Professionele plot template
def create_professional_plot(data, title, xlabel, ylabel):
"""
Template voor professionele plots
"""
# Figuur instellen
fig, ax = plt.subplots(figsize=(12, 7))
# Colorblind-vriendelijk kleurenpalet
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
'#9467bd', '#8c564b', '#e377c2']
# Plot data
for i, column in enumerate(data.columns):
ax.plot(data.index, data[column],
color=colors[i % len(colors)],
linewidth=2,
marker='o' if len(data) < 20 else None,
label=column)
# Styling
ax.set_title(title, fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel(xlabel, fontsize=12, fontweight='bold')
ax.set_ylabel(ylabel, fontsize=12, fontweight='bold')
# Grid en spines
ax.grid(True, linestyle='--', alpha=0.3)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Legend
ax.legend(frameon=True, fancybox=True,
shadow=True, fontsize=10)
# Annotatie voor data bron
ax.text(0.02, 0.02, 'Bron: DataPartner365 Analysis',
transform=ax.transAxes, fontsize=9,
style='italic', alpha=0.7)
# Tight layout
plt.tight_layout()
return fig, ax
# Gebruik
import pandas as pd
data = pd.DataFrame({
'Product A': [100, 120, 150, 180, 200],
'Product B': [80, 90, 110, 130, 150],
'Product C': [60, 70, 85, 95, 110]
}, index=['Q1', 'Q2', 'Q3', 'Q4', 'Q5'])
fig, ax = create_professional_plot(
data=data,
title='Quarterly Product Sales Performance',
xlabel='Quarter',
ylabel='Sales (€1000)'
)
plt.savefig('professional_sales_chart.png', dpi=300,
bbox_inches='tight', transparent=False)
plt.show()
8. Dashboard creëren met Plotly Dash
Wat is Plotly Dash?
Dash is een Python framework voor het bouwen van analytische web applicaties. Geen JavaScript nodig - alles in Python. Perfect voor data dashboards en BI tools.
Basis Dash app
# app.py
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
import pandas as pd
import numpy as np
# Sample data
np.random.seed(42)
n_samples = 1000
df = pd.DataFrame({
'Date': pd.date_range(start='2024-01-01', periods=n_samples, freq='D'),
'Sales': np.random.randn(n_samples).cumsum() + 100,
'Region': np.random.choice(['North', 'South', 'East', 'West'], n_samples),
'Product': np.random.choice(['A', 'B', 'C'], n_samples),
'Profit': np.random.uniform(10, 100, n_samples)
})
# Initialize Dash app
app = dash.Dash(__name__)
# App layout
app.layout = html.Div([
# Header
html.H1('Sales Dashboard', style={'textAlign': 'center', 'color': '#2c3e50'}),
# Filters
html.Div([
html.Label('Select Region:'),
dcc.Dropdown(
id='region-dropdown',
options=[{'label': region, 'value': region}
for region in df['Region'].unique()],
value=['North', 'South'],
multi=True
),
html.Label('Date Range:', style={'marginTop': '20px'}),
dcc.DatePickerRange(
id='date-picker',
min_date_allowed=df['Date'].min(),
max_date_allowed=df['Date'].max(),
start_date=df['Date'].min(),
end_date=df['Date'].max()
),
html.Label('Sales Threshold:', style={'marginTop': '20px'}),
dcc.Slider(
id='sales-slider',
min=0,
max=200,
value=50,
marks={0: '0', 50: '50', 100: '100', 150: '150', 200: '200'}
)
], style={'width': '20%', 'display': 'inline-block', 'verticalAlign': 'top', 'padding': '20px'}),
# Graphs
html.Div([
# Row 1
html.Div([
dcc.Graph(id='sales-time-series'),
dcc.Graph(id='region-bar-chart')
], style={'display': 'flex', 'justifyContent': 'space-between'}),
# Row 2
html.Div([
dcc.Graph(id='product-pie-chart'),
dcc.Graph(id='profit-scatter')
], style={'display': 'flex', 'justifyContent': 'space-between', 'marginTop': '20px'})
], style={'width': '75%', 'display': 'inline-block'}),
# Summary statistics
html.Div(id='summary-stats', style={'marginTop': '30px', 'padding': '20px', 'backgroundColor': '#f8f9fa'})
])
# Callbacks voor interactiviteit
@app.callback(
[Output('sales-time-series', 'figure'),
Output('region-bar-chart', 'figure'),
Output('product-pie-chart', 'figure'),
Output('profit-scatter', 'figure'),
Output('summary-stats', 'children')],
[Input('region-dropdown', 'value'),
Input('date-picker', 'start_date'),
Input('date-picker', 'end_date'),
Input('sales-slider', 'value')]
)
def update_dashboard(selected_regions, start_date, end_date, sales_threshold):
# Filter data
filtered_df = df[
(df['Region'].isin(selected_regions)) &
(df['Date'] >= start_date) &
(df['Date'] <= end_date) &
(df['Sales'] >= sales_threshold)
]
# 1. Time series plot
time_series = px.line(filtered_df, x='Date', y='Sales',
title='Sales Over Time')
# 2. Bar chart by region
region_sales = filtered_df.groupby('Region')['Sales'].sum().reset_index()
bar_chart = px.bar(region_sales, x='Region', y='Sales',
title='Total Sales by Region',
color='Region')
# 3. Pie chart by product
product_sales = filtered_df.groupby('Product')['Sales'].sum().reset_index()
pie_chart = px.pie(product_sales, values='Sales', names='Product',
title='Sales Distribution by Product')
# 4. Scatter plot
scatter = px.scatter(filtered_df, x='Sales', y='Profit',
color='Product', size='Sales',
title='Sales vs Profit')
# 5. Summary statistics
total_sales = filtered_df['Sales'].sum()
avg_profit = filtered_df['Profit'].mean()
total_orders = len(filtered_df)
summary = html.Div([
html.H3('Summary Statistics'),
html.P(f"Total Sales: €{total_sales:,.2f}"),
html.P(f"Average Profit: €{avg_profit:.2f}"),
html.P(f"Total Orders: {total_orders:,}"),
html.P(f"Date Range: {start_date} to {end_date}")
])
return time_series, bar_chart, pie_chart, scatter, summary
if __name__ == '__main__':
app.run_server(debug=True, port=8050)
Uitvoeren: python app.py en open http://localhost:8050 in je browser.
9. Praktijkvoorbeeld: Sales dashboard
Complete sales analysis dashboard
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import seaborn as sns
# 1. Data simulatie
np.random.seed(42)
n_customers = 1000
n_days = 365
# Customer data
customer_data = pd.DataFrame({
'customer_id': range(1, n_customers + 1),
'region': np.random.choice(['North', 'South', 'East', 'West'], n_customers),
'customer_type': np.random.choice(['Retail', 'Business', 'Wholesale'], n_customers, p=[0.6, 0.3, 0.1]),
'signup_date': pd.to_datetime('2023-01-01') + pd.to_timedelta(np.random.randint(0, n_days, n_customers), unit='D')
})
# Sales transactions
n_transactions = 5000
transactions = pd.DataFrame({
'transaction_id': range(1, n_transactions + 1),
'customer_id': np.random.choice(customer_data['customer_id'], n_transactions),
'date': pd.to_datetime('2024-01-01') + pd.to_timedelta(np.random.randint(0, n_days, n_transactions), unit='D'),
'product_category': np.random.choice(['Electronics', 'Clothing', 'Home', 'Books'], n_transactions),
'amount': np.random.exponential(100, n_transactions),
'quantity': np.random.randint(1, 10, n_transactions)
})
# Merge data
sales_data = pd.merge(transactions, customer_data, on='customer_id')
# 2. Data aggregaties voor visualisatie
# Daily sales
daily_sales = sales_data.groupby('date').agg({
'amount': 'sum',
'transaction_id': 'count'
}).reset_index()
daily_sales.columns = ['date', 'daily_revenue', 'daily_transactions']
# Monthly sales
sales_data['month'] = sales_data['date'].dt.to_period('M').astype(str)
monthly_sales = sales_data.groupby('month').agg({
'amount': 'sum',
'transaction_id': 'count',
'customer_id': 'nunique'
}).reset_index()
monthly_sales.columns = ['month', 'monthly_revenue', 'monthly_transactions', 'unique_customers']
# 3. Matplotlib/Seaborn static analysis
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Revenue trend
axes[0, 0].plot(daily_sales['date'], daily_sales['daily_revenue' ], color='#3498db', linewidth=2)
axes[0, 0].set_title('Daily Revenue Trend', fontweight='bold')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Revenue (€)')
axes[0, 0].grid(True, alpha=0.3)
# Revenue by region
region_revenue = sales_data.groupby('region')['amount'].sum().sort_values()
axes[0, 1].bar(region_revenue.index, region_revenue.values, color=['#1abc9c', '#3498db', '#9b59b6', '#e74c3c'])
axes[0, 1].set_title('Revenue by Region', fontweight='bold')
axes[0, 1].set_xlabel('Region')
axes[0, 1].set_ylabel('Total Revenue (€)')
for i, v in enumerate(region_revenue.values):
axes[0, 1].text(i, v + max(region_revenue.values)*0.01, f'€{v:,.0f}',
ha='center', fontweight='bold')
# Product category distribution
category_sales = sales_data.groupby('product_category')['amount'].sum()
axes[1, 0].pie(category_sales.values, labels=category_sales.index,
autopct='%1.1f%%', colors=['#1abc9c', '#3498db', '#9b59b6', '#e74c3c'])
axes[1, 0].set_title('Revenue by Product Category', fontweight='bold')
# Customer type analysis
customer_revenue = sales_data.groupby('customer_type')['amount'].sum()
axes[1, 1].barh(customer_revenue.index, customer_revenue.values,
color=['#1abc9c', '#3498db', '#9b59b6'])
axes[1, 1].set_title('Revenue by Customer Type', fontweight='bold')
axes[1, 1].set_xlabel('Total Revenue (€)')
plt.suptitle('Sales Performance Analysis Dashboard', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig('sales_dashboard_static.png', dpi=300, bbox_inches='tight')
plt.show()
# 4. Plotly interactive dashboard
fig = make_subplots(
rows=3, cols=3,
subplot_titles=('Monthly Revenue Trend', 'Revenue by Region', 'Revenue by Product',
'Customer Distribution', 'Transaction Distribution', 'Revenue Heatmap',
'Customer Type Revenue', 'Monthly Growth', 'Top Customers'),
specs=[[{'type': 'scatter'}, {'type': 'bar'}, {'type': 'pie'}],
[{'type': 'histogram'}, {'type': 'box'}, {'type': 'heatmap'}],
[{'type': 'bar'}, {'type': 'scatter'}, {'type': 'bar'}]],
vertical_spacing=0.08,
horizontal_spacing=0.08
)
# 1. Monthly Revenue Trend
fig.add_trace(
go.Scatter(x=monthly_sales['month'], y=monthly_sales['monthly_revenue'],
mode='lines+markers', name='Revenue',
line=dict(color='#3498db', width=3)),
row=1, col=1
)
# 2. Revenue by Region
fig.add_trace(
go.Bar(x=region_revenue.index, y=region_revenue.values,
marker_color=['#1abc9c', '#3498db', '#9b59b6', '#e74c3c'],
name='Region Revenue'),
row=1, col=2
)
# 3. Revenue by Product Category
fig.add_trace(
go.Pie(labels=category_sales.index, values=category_sales.values,
hole=0.3, marker_colors=['#1abc9c', '#3498db', '#9b59b6', '#e74c3c']),
row=1, col=3
)
# 4. Customer Distribution by Region
region_customers = sales_data.groupby('region')['customer_id'].nunique()
fig.add_trace(
go.Histogram(x=sales_data['region'], nbinsx=4,
marker_color=['#1abc9c', '#3498db', '#9b59b6', '#e74c3c'],
name='Customers by Region'),
row=2, col=1
)
# 5. Transaction Amount Distribution
fig.add_trace(
go.Box(y=sales_data['amount'], name='Transaction Amount',
boxpoints='outliers', marker_color='#3498db'),
row=2, col=2
)
# 6. Revenue Heatmap (Day of Week vs Hour)
sales_data['day_of_week'] = sales_data['date'].dt.day_name()
sales_data['hour'] = sales_data['date'].dt.hour
heatmap_data = sales_data.pivot_table(values='amount',
index='day_of_week',
columns='hour',
aggfunc='sum').fillna(0)
fig.add_trace(
go.Heatmap(z=heatmap_data.values,
x=heatmap_data.columns,
y=heatmap_data.index,
colorscale='Viridis',
name='Revenue Heatmap'),
row=2, col=3
)
# 7. Revenue by Customer Type
fig.add_trace(
go.Bar(x=customer_revenue.index, y=customer_revenue.values,
marker_color=['#1abc9c', '#3498db', '#9b59b6'],
name='Customer Type Revenue'),
row=3, col=1
)
# 8. Monthly Growth
monthly_sales['growth'] = monthly_sales['monthly_revenue'].pct_change() * 100
fig.add_trace(
go.Scatter(x=monthly_sales['month'], y=monthly_sales['growth'],
mode='lines+markers', name='Growth %',
line=dict(color='#27ae60', width=3),
fill='tozeroy'),
row=3, col=2
)
# 9. Top 10 Customers
top_customers = sales_data.groupby('customer_id')['amount'].sum().nlargest(10)
fig.add_trace(
go.Bar(x=[f'Customer {i}' for i in top_customers.index],
y=top_customers.values,
marker_color='#e74c3c',
name='Top Customers'),
row=3, col=3
)
# Update layout
fig.update_layout(
title_text='Interactive Sales Dashboard - Complete Analysis',
height=1200,
showlegend=False,
title_font_size=20,
title_font_color='#2c3e50'
)
# Update axes
fig.update_xaxes(tickangle=45)
fig.update_yaxes(title_text="Revenue (€)", row=1, col=1)
fig.update_yaxes(title_text="Revenue (€)", row=1, col=2)
fig.update_yaxes(title_text="Count", row=2, col=1)
fig.update_yaxes(title_text="Amount (€)", row=2, col=2)
fig.update_yaxes(title_text="Growth %", row=3, col=2)
fig.update_yaxes(title_text="Revenue (€)", row=3, col=3)
# Save interactive dashboard
fig.write_html("interactive_sales_dashboard.html")
print("✅ Dashboard created successfully!")
print("📊 Static dashboard saved as: sales_dashboard_static.png")
print("🖱️ Interactive dashboard saved as: interactive_sales_dashboard.html")
print("📈 Summary statistics:")
print(f" • Total Revenue: €{sales_data['amount'].sum():,.2f}")
print(f" • Total Transactions: {len(sales_data):,}")
print(f" • Unique Customers: {sales_data['customer_id'].nunique():,}")
print(f" • Average Transaction: €{sales_data['amount'].mean():.2f}")
print(f" • Date Range: {sales_data['date'].min().date()} to {sales_data['date'].max().date()}")
Key Metrics Dashboard
Dit dashboard combineert verschillende visualisatie technieken:
- Matplotlib: Voor static rapporten en publicatie-kwaliteit plots
- Seaborn: Voor snelle data exploratie en statistische plots
- Plotly: Voor interactieve dashboards en web applicaties
- Dashboard features: Filters, interactiviteit, real-time updates
Klaar om te beginnen met Data Visualisatie?
Vind data professionals of plaats je vacature voor visualisatie projecten
Conclusie en volgende stappen
Data visualisatie in Python is een krachtige vaardigheid die je in staat stelt complexe data begrijpelijk te maken. Je hebt nu geleerd:
- Matplotlib: Voor volledige controle en publicatie-kwaliteit plots
- Seaborn: Voor statistische analyse en snelle data exploratie
- Plotly: Voor interactieve visualisaties en dashboards
- Best practices: Voor effectieve en ethische data presentatie
Volgende stappen:
- Begin met eenvoudige line plots voor je eigen data
- Experimenteer met Seaborn voor data exploratie
- Bouw een eenvoudig dashboard met Plotly Dash
- Volg ons blog voor geavanceerde visualisatie technieken
- Plaats een vacature als je een data visualisatie expert nodig hebt