Pitfalls of using location streams in building live location features

Developers build live location features in their product because it is core to their business. In the process they end up having to build infrastructure. Developers who have built and operated this underlying infrastructure on their own have a deeper experience of the pains involved. Often this is the first time they start questioning if that part of the stack is core to their business and if there is an API for that. In the chance that they find HyperTrack as that API they were dreaming of a new problem arises.

In many in-house location stacks the *de facto *interface at which the infrastructure stops and the feature begins is the location stream. You generate the location stream on the device log/store it on the server and then consume it in the product. For enterprises using vehicle tracking systems (VTS) and GPS chips there is a legacy reason to go down this path. When other teams and software modules in the stack are consuming this stream it is hard to imagine ripping it out and replacing it with HyperTrack SDKs and APIs.

And then begins: “But it does solve a bunch of things we are dealing with”, “Hey they seem obsessed with it and we are not they must know something we don’t”, “Let’s deploy it on that new feature or new set of users in that new project”, “Let’s run this in parallel with our stack and compare the results before phasing it in”, etc.

It is then that one of the two questions comes up:

“Can we provide our location stream as input to your API and get your location stream as output?”
“Can we use your location stream as input to our API and everything upstream will start working better?”

Tl;dr – Bad idea. Using location streams as the interface is architecture debt. Pay it off.

There are multiple myths buried under. Let’s go over them one-by-one and see why using location streams as an interface for building live location features has significant pitfalls.

The frequency myth

“What is the frequency of your location data?”

Developers believe that location streams are location streams are location streams: you pick a magic interval at which you poll longlat from the OS and then stream it up to the server using a real-time communication library. This data forms the primitive to implement live location features.

Except it doesn’t.

As an example consider a stream which has missing location points. Does this mean the device is out of network coverage? Does this mean the device is out of GPS coverage? Should you implement buffering or caching of location data to solve this or build health monitoring capability to detect outages? Should you add timestamps for getting the location from the device or for receiving them on the server or both?

Consider another example where you get the same location point or worse a cluster of points within a few meters from each other repeatedly. Does this mean the user is stationary at a location? Which location data should you use? If the next point received is a hundred meter away does that mean the user has moved there or is it a stray location point thrown back by the device?

Device health and activity data is critical to get the context that makes location useful. De-coupling the frequency of getting device location/activity and beaming it up to the server is critical to power battery efficiency and location accuracy.

Fixed frequency location streams fail to factor these in.

The logging myth

“I log my location streams and tag them with context (e.g. user) as the need comes up.”

Developers believe that a quick and scalable way to build location infrastructure is to log the data with tags and then query it as the need comes up – say for making assignments or analyzing routes or powering real-time tracking on a map.

It kinda works but here’s the issue.

When assigning a ride to the nearest driver (say) the last location of users is dangerously evasive. At best the latency due to the frequency at which locations were logged will provide an inaccurate picture of where drivers are at. It gets worse when the last location received is either grossly inaccurate or simply out of date because the driver went offline. So then you add timestamps to filter out stale points and use a location filters to sanitize the logs before you use them. However the points are still trickling in with fixed frequency and you do not have the luxury to wait for the next beat before making the assignment decision.

When analyzing a route or powering real-time tracking on a map the logs lead to hopping by design. A cluster of points due to a possible stop can lead to confusing hops around the same location. Far-removed points on a road where the user moved a fair distance between points can lead to jarring hops between distant locations. Plotting raw logs can lead to skids and slides on rooftops and sidewalks. An outage in the logs can lead to experiences where you are staring at a point for a while only to suddenly jump to the destination where an action on the app probably got geotagged with a high-confidence location.

Generating transmitting and storing location data in the context of the business workflow is a necessary investment in the early stages of designing the location stack. Building a two-way communication channel between the device and server is critical so either can reach the other on-demand.

Raw logs of location streams fail to factor this in later.

The timestamp myth

“I have time ticks for every important action in my business. I can just get location at those points in time.”

Operations managers believe that location streams are location streams are location streams: you get the stream from vehicle tracking systems as one source smartphones as another source cheap GPS chips fitted on assets as a third source and so on. In conjunction with the frequency myth and logging myth managers are led to believe that the master log of this location data is a giant source of truth which can be used to build any live location feature by a back-end engineer.

Besides the fallacies of this myth as discussed in the sections above add in the complexity of true time. Timestamps of barcode scanners vehicle trackers custom chips smartphones and multiple servers are hardly ever in sync. The infrastructure to synchronize true time between these digital devices is non-trivial to build. The notion of fixed frequency streams further makes it a complex and challenging ask for any engineer to try and put this location data to use.

Building the infrastructure for true time is critical for putting location data to use.

The hostage myth

“If I use you full-stack from SDK to API to visuals I will never be able to exit you.”

When a developer takes the first serious look at a stack that can generate location data expose feature-ready APIs in the cloud and generate tracking views to integrate in product experiences it all seems too monolithic. It seems that using one simple clean interface – a location stream – as input and output gives the flexibility to plug in and out when you want.

Let’s first get this out of the way. As much as we would be sad to see you go (and in the short life of our company we have never lost a customer) you have the freedom to leave at any time and take your data with you. Here’s what we have done to make your exit easy:

Our SDKs are open sourced. They are free. They are built to work with our APIs of course but can easily be made to work with yours.
Our dashboard and tracking views are available as hyperlinks and open-sourced SDKs. Ripping them out or pointing them to yours is easy.
All your location data is available to you in our format that includes useful meta-data (e.g. trips stops events) in addition to locations points with time stamps.
Last but not least our APIs are built with much thought and tender love & care. Our goal is to be valuable to your location stack as your design partners even if you decide to not use us. These are elegant entities and abstractions that you would look at and go “hey that’s how I should design the interfaces of my location infrastructure to build my location features”.

Building infrastructure for live location features is hard

In summary building infrastructure for live location features is hard. It involves stacks for the device (app) cloud (business workflow) and visuals (dashboards product experiences) to perform a beautiful dance with each other in real-time to be useful. We believe that plugging an SDK into the app and mapping your business workflows with generic APIs is a better way to build location features than interfacing directly with location streams. Don’t live with the pain. Give us a spin. It only takes a few minutes to start seeing your users’ location data and then a few lines of code per feature.

The frequency myth

The logging myth

The timestamp myth

The hostage myth

Building infrastructure for live location features is hard

Subscribe to Your guide to business automation in the logistics industry

You Might Also Like...