First thing for Autonomous vehicles : Proper annotation

As the Driverless or autonomous vehicle buzz is going around, with almost all car manufacturers trying something on their own to create enough news around their activities in all the car shows, we obviously see more Driver Assistance coming up in driving, parking, etc. A fully autonomous car plying on your roads, and believe me, this would be restricted only to a few countries where good driving habits exists and huge penalty is incurred for any traffic infractions, would follow a series of fundamental steps (there would be more than three but for our discussion I am limiting it to three) to make this happen over the next few years:

(i) Put a camera or multiple cameras on a car (called host car or the EGO vehicle) and drive around the entire town and outskirts mapping all possible videos and images

(ii) Annotating or tagging or marking those images to give the objects seen in those images some consistent meaning (video would have lots of frames and each frame would have lots of images which includes objects outside of the host car) which becomes the data for future processing – this is a manually intensive task, and

(iii) A distributed computing environment working on these large datasets to get trained (this is there the Artificial Intelligence – AI kicks in) and scores of models developed for various situations for the cars to interpret and make decisions on the fly by recognizing such objects and patterns.

On a granular level, any tagging would have to be done minimally (if not more) in three different groups:

(i) Static – just the road signs, road lane boundaries and foot path boundaries, everything out there that is limited to the road and parking spaces.

(ii) Dynamic (moving) – over the static, other than the host car, it must be able to identify others vehicles, pedestrians and even animals (in certain countries where they freely roam on the road), and

(iii) Semantic or Contextual (immobile) – like the objects that are part of a scenery like hills, sky etc.

Having said this, all these activities are very time consuming and although there exist automatic annotation tools, its accuracy levels are so low that all the OEMs prefer to do manual annotations of all the images by themselves. Consider this against what Google can provide in terms of the 3D building view, its GPS coordinates that shows the road conditions across various seasons, its mapping facility and all the optimized algorithms associated with it, and we are slowly conforming to an entire picture set that every car would have that would be constantly changing every day. For Ex., some road gets dug up for some wiring or plumbing, then this has to be instantly disabled across all the autonomous cars plying so that there are no driving errors associated with this. Also depending on the road conditions, the lanes can change or be accommodative to changes which has to be instantly delivered into the brains on autonomous cars. Weather conditions, accidents, light conditions and intended blocks have all to be captured in real time, and hence this data has to be refreshed frequently every hour.

I am sure with all this complexity, Google is sitting pretty to address this problem once for all, and be there to constantly give us the updates and sell it as an irresistible package that every car manufacturer will be able to consume down the road (pun intended intentionally). When it comes to autonomous cars, it is all about navigating (mapping) and positioning (absolute and relative to others) and both needs to be absolutely precise at any given point in time. If the label is wrong, then there are serious consequences to the same (fatalities can happen if the trained car missed a cyclist in the frame!).

Training these data to proper information that the car has to act on involves zero tolerance to errors as safety is the critical need here and so this machine learning using neural networks would have to be done rigorously and tested across various parameters and conditions before any logical usage of it in real time. The output of the computing would be as good as the inputs one gives into the system. In fact, it is also imperative that one may need to redo roads to accommodate autonomous driving, and this would include lots of capital expenditure by countries and human labor to do this. The entire transportation system would become more intelligent and automated.

All these data being captured by video, even in these times of fast internet, get hog the bandwidth and hence they would require co-location of servers to store and compute. Pretty soon, once autonomous cars are in place, it would not be too far before Traffic Robots take over a policeman manning the streets.

Having talked about the annotation tool software, when it comes to licenses, there are two major criteria for choosing them: Duration and usage or users. First for the duration of the license, the software vendors would either have it temporary for a given period (say 6 months or a year) which needs to be renewed in full every period after it expires, or perpetual where the license is of indefinite period, usually all updates are free for the first one or two years and after which one needs to pay a fraction of the entire license cost every year. Second is how the usage of these software can happen which means either it is node locked to a client machine or floating in a server world which is nowadays usually in the cloud from where a license key is fetched. So, it is the case that a temporary license may work out more expensive than a perpetual for a longer period of use, and a floating license is usually more expensive than a node locked one. Usually annotation projects encompass many annotators using the tool, over more than 2 years as the amount of video feed is in the orders of TB every week. So, it would make sense to go and negotiate the best rate for a floating perpetual license from the vendor.

All these are distant dreams that may not see the light at the end of my day. But still the research and the work resulting of autonomous cars would get applied into many other functions around you which would make life easier, I suppose.

Author: Raja

Business and technology consultant, as well as a devotee volunteer for the Kanchi Kamakoti Mutt View all posts by Raja

Author: Raja

Leave a Reply Cancel reply