Every IoT system ultimately exists to collect, process, and act upon data. The sensors, gateways, communication protocols, and cloud platforms are all infrastructure in service of a single goal: producing reliable, actionable data that drives better decisions. Yet data quality is often an afterthought in IoT projects, leading to deployments that generate large volumes of data but little genuine insight.
This article explores why data quality matters in IoT, how to collect sensor data effectively, and how to turn raw measurements into operational intelligence.
Why Data Quality Matters
The consequences of poor data quality in IoT deployments are significant and often underestimated:
- Incorrect decisions: If energy monitoring data is inaccurate, decisions based on it, such as equipment replacements, load shifting, or capacity planning, may be wrong and costly.
- Missed savings: Energy waste can only be identified if the monitoring data is accurate and granular enough to reveal it. A sensor with a 10% error margin can mask significant waste.
- Compliance failures: Regulatory reporting requirements, such as those under the EU Energy Efficiency Directive, demand accurate data. Inaccurate reporting can lead to penalties.
- Eroded trust: When stakeholders discover that data is unreliable, trust in the entire monitoring system collapses, often permanently.
The Sensor Data Pipeline
Understanding the end-to-end data pipeline helps identify where quality issues can be introduced and how to prevent them.
1. Physical Measurement
The pipeline begins with the physical sensor. For energy monitoring, this typically involves current transformers (CTs) measuring current, and voltage taps measuring voltage. The accuracy of the entire system is fundamentally limited by the accuracy of these physical measurements.
Key considerations at this stage:
- CT accuracy class: CTs are rated by accuracy class (e.g., Class 0.5, Class 1). A Class 1 CT has a maximum error of 1% at rated current. Choose the accuracy class appropriate for your application.
- CT sizing: A CT sized for 400A will be inaccurate when measuring 20A of actual load, because accuracy degrades at low percentages of the rated current. Size CTs for the expected operating range, not the maximum circuit capacity.
- Installation quality: A CT installed backwards will report negative power. A CT not fully closed around the conductor will produce inaccurate readings. Installation quality directly impacts data quality.
2. Analogue-to-Digital Conversion
The analogue signal from the sensor is converted to a digital value by the measurement device. The resolution and sampling rate of the analogue-to-digital converter (ADC) determine how precisely and frequently the physical measurement is captured.
For energy monitoring, the measurement device typically calculates derived quantities such as active power (kW), reactive power (kVAR), apparent power (kVA), power factor, and energy (kWh) from the raw voltage and current samples. The algorithms used for these calculations, and their sampling rates, affect accuracy.
3. Data Transmission
Data must be transmitted from the sensor to the gateway, and from the gateway to the cloud platform. At each stage, data can be lost, delayed, or corrupted:
- Wireless interference: ZigBee, Wi-Fi, and other wireless protocols can experience interference, leading to dropped packets and data gaps.
- Network outages: Internet connectivity interruptions can prevent data from reaching the cloud. Edge buffering on the gateway mitigates this risk.
- Clock synchronisation: If sensor clocks are not synchronised, timestamps become unreliable, making it impossible to correlate data from different sensors.
4. Data Processing and Storage
Once data reaches the cloud platform, it is processed, validated, and stored. Processing may include:
- Deduplication of data received multiple times due to retransmissions
- Gap detection and flagging of missing data periods
- Unit conversion and normalisation
- Aggregation into time-aligned intervals (e.g., 15-minute averages)
- Anomaly detection to flag unlikely values
Best Practices for Data Collection
Choose the Right Sampling Rate
Higher sampling rates provide more granular data but increase storage and transmission costs. For most energy monitoring applications, one-minute intervals provide a good balance between granularity and efficiency. Higher frequencies (one-second or sub-second) are needed for power quality analysis or demand response verification.
Implement Edge Validation
Validate data as close to the source as possible. The gateway should check for obviously invalid readings (negative energy values, physically impossible power factors, values outside the sensor's measurement range) and flag or discard them before transmission.
Use Standardised Data Formats
Adopt standardised data formats and naming conventions from the start. Every data point should include, at minimum:
- A unique device identifier
- A channel or measurement type identifier
- A UTC timestamp (ISO 8601 format)
- The measurement value
- The unit of measurement
Ensure Clock Synchronisation
All devices in the system should synchronise their clocks using NTP (Network Time Protocol) or a similar mechanism. Clock drift of even a few seconds can cause problems when correlating data from multiple sensors or when calculating interval-based metrics like 15-minute demand.
Plan for Data Gaps
No IoT system achieves 100% data availability indefinitely. Plan for gaps by implementing local buffering on gateways, designing your analytics to handle missing intervals gracefully, and establishing procedures for identifying and addressing persistent data gaps.
From Data to Insight
Raw sensor data has limited value until it is transformed into actionable insight. The key stages of this transformation are:
- Visualisation: Presenting data in dashboards that make patterns and anomalies immediately visible. Time-series charts, heatmaps, and comparison views are essential.
- Benchmarking: Comparing consumption against baselines, budgets, or similar sites to identify underperformance.
- Alerting: Configuring thresholds and anomaly detection rules that notify operators when something requires attention.
- Analysis: Using the data to answer specific questions: Why did consumption spike last Thursday? Which circuits are responsible for the baseload? How does consumption correlate with outdoor temperature?
- Optimisation: Using insights to drive operational changes: adjusting HVAC schedules, replacing inefficient equipment, shifting loads to off-peak periods.
How EpiSensor Ensures Data Quality
EpiSensor's platform is built around data integrity at every stage. The Gateway collects data from sensors via the ZigBee mesh network, validates it at the edge, and buffers it locally to prevent gaps during network outages. Data is transmitted to EpiSensor Core over encrypted connections with delivery confirmation. Core processes, stores, and presents the data through dashboards and APIs, with full audit trails and data export capabilities.
By managing the entire data pipeline from sensor to cloud, EpiSensor ensures that the data you rely on for decisions is accurate, complete, and trustworthy.