How We Predict Berlin Rents: 80 Features, Satellite Data, and AI Photos

A transparent look inside our ML pipeline — from raw listings to high-accuracy predictions.

Machine Learning
Methodology
Berlin
Satellite
XGBoost
RentSignal’s rent prediction model explained: XGBoost with 80 features including satellite imagery, AI photo analysis, and spatial rent intelligence. We show exactly how it works and why it outperforms the Mietspiegel.
Author

Klaus Redel

Published

March 22, 2026

Keywords

rent prediction machine learning, XGBoost real estate, Berlin rent model, satellite data rent prediction, SHAP explainability rental

Why Build a Rent Prediction Model?

Germany’s rental market is heavily regulated. The Mietpreisbremse (rent brake) caps new-lease rents at 10% above the local reference rent (Mietspiegel). But the Mietspiegel is a blunt instrument — it groups apartments into broad categories and assigns a single range.

Two apartments in the same Mietspiegel category can have vastly different market rents. A renovated Altbau with original floorboards on a quiet courtyard is not the same as an unrenovated flat on a busy street — even if the Mietspiegel says they are.

Our model captures what the Mietspiegel misses. With 80 features across five intelligence layers, we explain over 81% of rent variation — compared to ~35% for the Mietspiegel alone.

The Data

Our training data comes from ImmoScout24, Berlin’s largest rental platform:

Metric Value
Listings analyzed 8,259
Regular market listings (training) 4,828
Photos analyzed by AI 55,000+
Listings with coordinates 99.9%
Data period March 2026

We exclude apartment swap listings (Tauschwohnungen) from training — these have artificially low rents that don’t reflect market pricing. They’re flagged automatically via NLP title analysis and handled separately.

/tmp/ipykernel_2615/679150953.py:5: DeprecationWarning: *scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/
  fig.add_trace(go.Scattermapbox(
Figure 1: Berlin rent landscape — our 8,259 listings colored by rent level

Five Intelligence Layers

Figure 2: Each feature layer adds meaningful prediction accuracy

Layer 1: Structural Features

The basics from the listing: size, rooms, floor, year built, amenities (kitchen, balcony, elevator), condition, heating type. These alone give R²≈0.69.

Layer 2: Spatial Features — OSM + Satellite

For each apartment’s coordinates, we compute distances to transit, parks, schools, and water bodies. We count restaurants, cafés, and shops within walking distance. From Sentinel-2 satellite imagery, we extract vegetation (NDVI), water proximity (NDWI), and urban density (NDBI) at multiple scales.

Restaurant density within 1km is consistently a top spatial predictor — it captures neighborhood vibrancy better than any single location variable.

Layer 3: NLP Title Features

The listing title contains signal that structured fields miss. We extract indicators for apartment swaps, furnished listings, Altbau/Neubau mentions, and renovation keywords. The number of listing photos is itself a quality signal.

Layer 4: AI Photo Features

We analyze listing photos with AI to extract visual quality scores — interior condition, kitchen/bathroom quality, floor type, ceiling height, architectural style, and building facade condition. Read more about our AI photo pipeline →

Layer 5: Neighborhood Rent Intelligence

The newest and most powerful layer. For each apartment, we compute what nearby apartments actually rent for — within 500m and 1km. The median rent in the same postal code provides a stable anchor. Rent dispersion (price variation) captures gentrification dynamics.

What Matters Most: Feature Importance

Figure 3: Top 15 features by importance (SHAP values) — all five layers contribute

All five layers contribute to the top 15. No single layer dominates — the model needs structural data, spatial context, NLP signals, visual quality, AND neighborhood pricing to achieve its full accuracy.

Prediction Intervals: How Confident Is This?

A point estimate isn’t enough. We use Conformalized Quantile Regression — a method that provides adaptive prediction intervals with a statistical coverage guarantee.

Figure 4: Prediction intervals adapt to each apartment — wider for unusual ones, tighter for typical ones
  • Typical apartment: interval width ~€6-8/m²
  • Easy to predict: width ~€4/m² (common apartment types with many comparables)
  • Unusual apartment: width €12-20/m² (luxury, micro-studios — model correctly signals uncertainty)

The intervals maintain 80% coverage — meaning 80% of actual rents fall within the predicted range.

Spatial Validation: Is the Model Biased?

We mapped prediction residuals across all Berlin districts to check for geographic bias:

/tmp/ipykernel_2615/744294413.py:17: DeprecationWarning: *scattermapbox* is deprecated! Use *scattermap* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/
  fig.add_trace(go.Scattermapbox(
Figure 5: Prediction bias across Berlin — green = accurate, larger dots = more listings in that area
Figure 6: Mean prediction bias by Berlin district — all within ±€0.30/m² of zero

The model is spatially unbiased. Every district is within ±€0.30/m² of zero. This is critical for fairness — the model doesn’t systematically over- or under-predict in any neighborhood.

What This Means for You

For Tenants

Use our free compliance checker to see if your rent exceeds the legal maximum. Our model shows what the market actually pays for apartments like yours.

For Property Managers

Upload your apartments with photos for the most accurate prediction. The model rewards quality: renovated interiors, modern kitchens, and bright spaces all increase predicted rent. The feature worth table shows exactly what each feature contributes in €/month.

For Researchers

Our methodology is documented in our GitHub repository. We welcome collaboration on spatial econometrics, causal inference, and AI-powered property valuation.


→ Check your apartment’s predicted rent

→ Create a free account


This article describes the methodology behind RentSignal — the data-driven rent intelligence platform for the German rental market.


NoteDeutsche Zusammenfassung

So prognostizieren wir Berliner Mieten. Unser Modell nutzt 80 Features aus fünf Ebenen: strukturelle Daten, Raumanalyse (OSM + Satellit), Titelanalyse, KI-Bildanalyse und Nachbarschafts-Mietintelligenz. Wir erklären über 81% der Mietvariation — verglichen mit ~35% beim Mietspiegel. Jetzt kostenlos testen →