This month, we sat down with Aunalytics’ Vice President of Predictive Modeling, David Cieslak, PhD, to discuss his work on Internet of Things (IoT) analysis. We talked about the unique challenges of this type of project and what he’s learned, interesting insights he has discovered, and his thoughts on the future of the IoT field.
Q: How does an IoT project differ from other kinds of machine learning problems you have faced, and how are they similar? Are there differences within the data itself? Do you tend to use different methods of analysis?
DC: It is exciting for Aunalytics to engage in the Internet of Things. A data scientist or machine learning researcher is especially captivated by this kind of problem for a variety of reasons. Beyond working on a high profile, cutting-edge area that brings a lot of hype and excitement, IoT data is very rich. Like responsibly collected web traffic utilized in clickstream analysis, a data scientist can walk into an IoT project with reasonable assumptions for high levels of data capture and information quality as data is being collected via autonomous, digital sensors. Such sensors do a good job of collecting a variety of data points pertaining to device operation, location, and user interaction. These values are often logically and consistently encoded; in some cases they are tailor-made with analytics or data science in mind. The implication is that a data scientist can often assume a higher starting point.
While the overall signal quality often presents a better opportunity, such a challenge can put a data scientist’s quantitative skills and creativity to the test. Algorithmically, data scientists might be asked to get out of their comfort zone and utilize advanced signal processing methods in order to provide digestible information. Likewise, time series analysis and anomaly detection feature more heavily in the IoT domain. Whereas hand-crafted featurization was often sufficient for other types of projects, IoT almost necessitates advanced, automated quantization strategies in order to keep up with the pace of data and business. The rules of the IoT game are being written in-flight and it’s critical for a data scientist to be able to learn and adapt to what works and what doesn’t, particularly within this domain. This requires the ability to understand contributions from many different engineering and mathematical disciplines and leveraging prior work from a variety of sources while simultaneously boiling down complexity without making the subject matter overly simplistic.
Q: What specific challenges have you faced in terms of analysis of device data? What lessons have you learned so far?
DC: The biggest issue surrounding IoT data is scale. Historically, a “more is better” approach has been taken with regards to logging, but this can have very practical implications on analysis. Sensors on an IoT enabled device might generate a status snapshot every second or even more frequently. Even a single record generated every second means that you’ll be responsible for storing and processing 86,400 records of data per device every day. If you have a fleet of 10 devices generating 10 pieces of data in every snapshot, it’s easy to imagine how quickly 8.6 million daily records can begin to saturate even the bulkiest big data solutions available. Whereas data scientists have typically been begging for more data, it’s very easy to see how they might drown in this environment. One very real decision that must be made is to determine what data is collected and how often and whether any sampling must be done in order to accommodate analysis. As always, this depends on the application and the availability of data processing resources. Sampling fridge sensors every 5 minutes might lead to a miss in temperature spikes that cause your $50 Ahi steak to go bad. Sampling a subset of vehicle dynamics every 5 minutes might miss a rapidly evolving failure and lead to a fatal accident.
Relatedly, it can be very challenging to boil down a lot of data to the audience. While this is a pervasive challenge in data science, the temporal nature of the signals we’re receiving mean that it’s even more challenging for a data scientist to translate a relevant, high-level inquiry into something structured and measurable. This puts a lot of responsibility on a data scientist to have a sufficiently broad and workable “statistical vocabulary” to satisfy highly curious audiences.
Q: What kinds of “insights” have you been able to uncover with clients?
DC: So far, we’ve looked at IoT through a lens of consumer products. This focus has led us to uncover interesting utilization patterns. Our customers have invested heavily into engineering product features development but what we find is that in many instances customers lock into a small subset and use them habitually. The good news is that the utilized features are often fairly unique per customer, so few features are truly going to waste. This also represents an opportunity for our clients to develop outreach programs for better product education.
While product and engineering research is important, this can also translate into savings for the customer as well. Depending on where they live, they might be able to save money on their electrical bill by using appliances at specific points in the day. Likewise, business experts may be able to help clients use their devices in ways to save them money.
We’re also identifying frustration events in the data, typically where we’re observing “jamming patterns” where particular buttons are pressed obsessively over a short period of time. Likewise, we’re working to identify how sensory signals can be utilized to predict device failure, enabling a potentially interventionist strategy.
Q: What do you see as some of the most exciting developments in IoT so far?
DC: Overall, I’m excited to see many consumer durables and home goods entering the IoT breech. There are many gains to be made from power and environmental savings as well as safety by having such appliances monitored.
IoT has significant potential in fitness and health. EMR systems are largely incidence based—they track information when a patient has a problem, but fitness devices offer an opportunity for broadening an otherwise sparse dataset by quantifying fitness before a health incidence occurs.
There are significant opportunities for IoT within critical infrastructure pieces for our country and the world. Public transportation such as trains and planes stand to benefit from superior safety monitoring. Better monitoring can lead to alterations in operating policies which can also lead to better energy efficiencies. There are tremendous benefits within agriculture as well—farmers can now do a better job of tracking their crop “in flight,” meaning critical growth is at reduced risk for failure. These are only some of the ways that IoT technologies are being applied today.
Q: Where do you see this field going in the future?
DC: IoT is also a useful testbed for a broad class of robotics. Never before have we been able to collect so much data on behavior and activities at such a micro-level. Many of the biggest informatic developments of the last 20 years have been bootstrapped by creating data collection and tagging schemas, such as video and audio signal processing. In turn, collecting such voluminous data will enable robotics research to develop even better understandings of human interactions and activity, and allow them to make significant gains in this area over the next 20 years. Another rising area is Population Health, where we will develop better understandings of the influx of patients into particular healthcare facilities and practices.