Digital Transformation – what is the buzz?

Over the past couple of years, with more focus on efficient ways of working, lots of buzz being seen around Digital Transformation (DT).  Lots of job opportunities exist in this space, asking for relevant experience, and this short overview just would give one the confidence you have done enough or bits and pieces, that fits under the enterprise DT umbrella.

    Lots of innovations and technological advances that have been happening around our environment over the past decade or so, each with tremendous potential, are all being integrated under one umbrella of Digital Transformation, a must have in any Vision statement of companies worldwide, an end-to-end goal to realize and a tool to smartly compete and go ahead in engaging and delighting customers.

  According to HBR and Salesforce, DT is a new mantra of using DIGITAL technology to increase customer experience, better and improve existing business processes or invent new ones, create more efficient, automated processes to bring about a new WoW (Way of Working) and a total operational change within the company. Although it may be one person who may be tasked with this DT journey inside any corporate, it is the joint and cohesive effort of all stakeholders to create a new way of doing things more apt for the even changing market requirements and business conditions, which in turn results in awesome customer experiences.

 There are three main ingredients behind any Digital Transformation:

(i)             (Digital) technology that is data centric, and is about “modernizing the legacy”

(ii)           Business Processes – ‘automating the repeated tasks’ and ‘simplifying the flow’.

(iii)         Customer experience – delivered real time, to create better business value

    Much to the popular perception, the IT department does not solely own this transformation. Their ability to drive changes depends on the business unit’s willingness to adapt newer technology to deliver better value to their customers. Although this enterprise transformation is pivoted around IT per say (IT team headed by a CIO with a CDO and a PMO office), the positive changes would be seen only with the business units it would cater and hence they need to champion this effort end-to-end to realize tremendous gains with the full sponsorship of the top leadership team. This eventually leads to a data driven decision making across the organization and more fruitful culture evolving that stitches this belief and trust together.   I am always of the belief that culture is a derivative effect of how things get done within the company and this acts as a glue fabric that makes effective team work with an end purpose in mind. After all this transformation, or as the transformation is bearing fruit, you would also realize a subtle change to a drastic change by within the company on the ways of working.  

     What are the new digital technologies that make DT happen? To name a few buzzwords doing the rounds presently,  we have Internet of Things (IOT), analytics and business intelligence, Machine learning and Artificial intelligence (ML and AI), Automation (RPA), Cloud (be it private, public or hybrid), Augmented and Virtual reality AR/VR(a much needed enhancement to how manufacturing, education and healthcare does things now), Blockchain, Big Data warehouses and data lakes, Edge computing, etc One must be able to use most of these technologies, join them through a flow chart flow and see how the entire company can benefit from using them to deliver better business value.

   Changing legacy systems within any organizations is never easy. If it is not broken, why fix or change? There can be a lot of criticalities and complexities involved in changing any infrastructure and systems, which has to be understood and approved, after which there has to be a roadmap that introduces changes in phases till the entire benefits gets reaped.   It is not about adding a new incremental change, but about a total revamp of existing to something newer and scalable that delivers better results and offer flexibility.

   With remote work slated to be the norm going forward due to this pandemic, transporting oneself for group meetings, account for logging-in and clocking work hours, sharing designs as a method of development, educational experiences of students, etc would all be something we would look forward to through VR/AR.  With Bring your Own Devices (BYOD) and smart phones and tablets getting into the computing space and not only restricted to communication, cyber security would be a concentration area for corporates to work so that no disruption occurs in their operations.   Manufacturing floors are monitored remotely from elsewhere, thanks to many sensors attached there and video cameras fitted across, which has led to lesser downtimes and higher manufactured volumes within corporates.

   All these digital technologies are centred around the new ways of generating and receiving data from many sources, in large volumes, at a great speed and verifying the truthfulness of these data to derive good intelligence from them.    Big Data and Analytics, and Business Intelligence, have been the fulcrum of many banking and healthcare companies now for the past few years in realizing how data can be interpreted to offer better services to their customers and patients. It is no more the classic databases but rather the newer data lakes and warehouses which contain not only the structured data but also the unstructured data like text, images, video from which actions are being taken real-time (like chat boxes for customer support, and tele medicine for administering remote surgeries and health consulting, or object detection in autonomous vehicles).   If you have the entire organization taking decisions based on true data, I guess a significant transformation has been brought in place.

  Processes, ‘the needed evil’ in many companies, deliver a consistent way of doing things and they get updated regularly to make them more efficient. You must be able to connect various activities happening in various teams together to make something of value, and this is possible only with established processes within the organizations. Radically re-engineering existing processes, and automating repeatable tasks (RPA – Robotic Process Automation) are happening, benefiting both Time to Market (TTM) and better efficiencies.   Given the COVID pandemic lessons, it is imperative for companies to think about remote work as a permanent option (not all work of course, can be done remotely – lab support and manufacturing for example needs to be at the work place) and start changing human resources policies and processes accordingly.

 No business would get sucked into any change, the primary premise of any transformation and hence it is imperative to quantify the before and after with a significant business value that looks rational and real.  This business value can be quantifiable with increasing productivity or efficiency, increased revenue numbers, good customer scores, and better TTM, to name a few and any or all of these would certainly invoke interest to any sane business leader. As long as you have satisfied customers (first) and delighted customers (next), one can get an edge over any competition and this can just translate to better $ numbers. Customer experience starts from time you got them hooked, to engaging them throughout your combined journey to meeting all commitments with high quality consistently.       

   Stitching the technologies, data driven approach, efficient processes and better customer experience effectively is a test for the resilience for many companies and doing this together as a whole team would be the mission that makes this transformation happen.  Choosing the right technology for every problem and integrating them all as a whole needs a good technical team to be the backbone for this transformation.  Most companies fail because they do not have right talent to make the transformation happen, or they do not invest much in their own people. The only way to can get the entire organization to buy into this transformation is to show proof through smaller gains in phases and including all of them to be part of this journey.

  In conclusion, anybody who is into such IT transformation is accountable for the technology transformation of all the systems of the organization to support growth of the business and increase operational efficiencies. A senior role like the CIO is to champion and drive organizational and technological changes by building and maintaining strong relationships with all the appropriate business teams, thus partnering with them to serve their customers better in a more agile manner. 

   With proper collaboration with the entire organization, they should be able to identify solutions, prioritize roadmaps and deliver to commitments, with all the stakeholders in the loop.   It means modernizing the IT infrastructure to support good user experience, automate repeatable processes, and at the same time keeping security and scalability as a critical feature that cannot be compromised. Through AI powered Bots, aim to get maximum contactless sales and service for the organization, and create unbelievable user experience by maximizing real-time decisions (for example through awesome APIs and microservices, any organization can drive their e-commerce business along with a stable logistics backend).  

    Drive a culture within the organization that makes data drive all the decisions.    One should be on top of innovations by conducting business digitally and experimenting with the changing digital trends and audience preferences.

 The author comes with 25+ yrs of experience and is an active consultant and a trainer, an information seeker, and a yoga enthusiast.

Why are Flash storage companies the acquisition flavor of this season?

       Recently Dell has agreed to acquire EMC for $67 billion and then Western Digital intends to acquire Sandisk for $19 billion.  Both EMC and Sandisk made their name due to their storage expertise, although at opposite ends of the spectrum – EMC at the enterprise side of data storage and Sandisk in the client and mobile side. So, why are storage companies the flavor of the season to get acquired?

     Other than Sandisk, the three companies mentioned above are going through a slow demand and facing intense competition for their products – be it PCs which is nearing end of life as we know it, legacy storage or mechanical hard drives.  Dell with its server expertise combines well with the storage expertise of EMC to make an entity that can take the data center market (read Cloud) flag position – I am going to assume there is going to be some overlap in their storage products which may lead to some consolidation.    Also with the advent of Flash as enterprise storage, EMC may not be effective in competing at the price points with their fiber channel arrays. Everyone knows Sandisk from their consumer space but they have made acquisitions in the enterprise space a few years ago which made them have effective solutions across all the computing market segments.

     If you look at Flash as a storage (it is actually a solid state device – SSD, storing data using electricity and is non-volatile), they have been more used in the consumer space (embedded applications) in the laptop, cameras, tablet and mobile markets.  They are also more rugged as a physical drop of the device that has Flash storage does not lead to a data loss as it may in the case of magnetic drives. Presently their price points are about 5 to 10 times more expensive than the well-known desktop storage – the ‘mechanical’ hard disk.  But with volume usually the prices drop,  and higher volume is a result of more use cases of the technology – as data centers and cloud are the latest buzz words now, it is clear that enterprise storage class will have to go with Flash and its related array technology as they have no moving parts which results in less acoustic noise, less power consumed and less heat generated,  and comes with a thinner profile and are about 100x faster, all a big win for huge data centers being planned as less cooling and less real estate would be required, thus making them ‘more greener’.  The biggest drawback today for Flash is they have lesser longevity and old age reliability than the regular hard drives.  So, be it public or private cloud, looks like the future holds well with Flash technology.

    If you look at the competition for Flash as a storage medium, there are only a handful of them for Sandisk – a combination of Intel and Micron (individually they are leaders in Flash technology and together they have formed a partnership company as well), Samsung and Toshiba.  These companies have both the IP related to Flash which they license out and also have end products that they sell directly. Intel, having seen the light at the end of the tunnel for a few years now on their personal computing segment and having gone nowhere in the mobile and tablet business despite their acquisition of Infineon, is a leader in the data center and server farm business,  indirectly  though (meaning others use Intel’s technology and products to operate their own data centers) and raking in the dollars from this enterprise segment – in fact, this has been their most profitable business now and have customers like Amazon and Microsoft. They do have all the expertise and the IP behind the manufacturing techniques in the sub-micron space which helps them have a generation lead in the Flash technology. Samsung, the world leader in the memory space, is a major Flash supplier to other storage companies and they would like to stay that way given their business model. Then comes Toshiba which is a credible innovator and a technology provider in the Flash space.    Actually Toshiba invented both the NOR and NAND based Flash memory technologies, thanks to Dr. Masuoka, NOR considered a more expensive/GB, less durable and slower than NAND in general.

     Western Digital and Sandisk combined would offer a lot in the overall storage space across all market segments, and probably may be the biggest supplier to many an enterprise storage companies.  Flash can be packaged in different formats for different applications and it is capturing the eyes of chip makers, enterprise storage and server companies, thus potentially leading to more partnerships and mergers in this space in days to come.  Enterprise storage is the biggest wave that is happening with the cloud, and big data and analytics segment, and it looks like Flash is getting to be the biggest beneficiary.   Expect some strong moves by HP Enterprise, IBM, Hitachi, Netapp, Seagate and Kingston, and even the networking giants in the near future along the M&A space as technology predictors already are pointing to the fact that Flash storage market would exceed the Hard Disk storage market even as early as next year.

This article was written originally in 2014

A Beginner’s guide to Big Data, Analytics and Cloud.

I have been hearing a lot about these buzz words in the title for about couple of years now, and luckily had couple of opportunities to work on them over the past 6 months or so, thanks to my own consultancy, which made me read a few books and articles on these subjects that got  me ticking to know more.  With certainly no claim to being an expert in these areas, I have managed to gain some fair understanding that I thought would share here purely for education purposes, so that everyone who hears these buzzwords knows what it is all about and can manage to have a good conversation around it.  I must definitely thank the references below as most of the writing here is just a highly edited summation of the details found in them, to keep your reading more at the layman level.  Consider this as a crisp primer for the uninitiated from both the technology and the consumption angle.

     Business Process Outsourcing (BPO) that made India its right-sourcing capital during the first decade of this century has slowly moved on to the shores of Philippines and Vietnam.  Now it seems that everyone is talking about the Analytics and Cloud wave to have hit India, either through the typical right sourcing to analytics companies here or as a captive analytics center for some big multinational company.  Certainly I see the value being created in this wave to be higher than the BPO wave, and looks like India can establish its credential using its early mover advantage.  Lots of big names like HCL and IBM have got major contracts to maintain the entire IT departments of the Fortune 500 companies on a cloud model and this is a growing area.

     Digital economy(sales)  is ready to surpass physical economy.   Nowadays, all organizations are asking what their customers want and what do they generally do?  With your private information on any social media not exactly private as you think, they want to know who your friends are and what do they like? Who influences you and whom do you influence?  They have large quantities of the data in the world to analyze these information from and design a product or a feature to a product that you are bound to be happy with. Companies are thinking on their feet, in real time, very quick to react  to the feedback you have given them… at least good companies strive to do this and are placing their bets in this direction.  It is no more about a group of customers or a cross section of people that they want to study, but they want to know YOU as a customer individually.  Oh how lovely you feel!

     Analytics is being  used in both B2C and B2B  but the former is more challenging than the latter because predicting end consumer’s behavior to buy, which is usually emotional and irregular, is touch.  Businesses buy or consume in a more regular and rational fashion using usually a well-known process.  What makes the B2C modelling much harder is the fact the data here is more complex due to its volume and variants as more than half the data is ‘unstructured’.


    This Big DATA is just ‘data that is quite large that cannot be processed by conventional methods’.  McKinsey defines Big Data as large data sets that cannot be captured or analyzed by typical database software tools. So, Today’s Big Data may not be tomorrow’s Big Data as the tools would have caught up to analyze today’s Big Data tomorrow but Big Data of tomorrow would be orders of magnitude higher than today’s data so that the same problem remains.   Hence if it safe to say that we are just at the beginning of an explosion of a DATA world.   Big Data is all about the Internet of Things, social and mobile put together.

   The industry has defined Big Data across three V characteristics:  Volume, Variety, Velocity and sometimes a fourth gets added – Veracity. The volume is measured by the sheer size of the data, the variety talks about the assortment of data (structured vs unstructured) and the velocity about the speed at this data get created or processed. Veracity is the one that talks about the accuracy of these huge data, trust behind these data sources and how to take off the noise to arrive at decent useful information that makes good business sense.  The source of data can be either machine generated like sensor data, web log data, transaction data etc.  which are structured and satellite images, scientific data, multimedia which are unstructured data,  or human generated like survey input data, click-stream data, etc. which are structured data and emails, social media data, SMS etc. which are unstructured data.  Each one of these unstructured data can be an analytics domain independently, like text analytics, and lots of research is going into them. Usually structured data are stored in some sort of a table in a RDBMS and can be queried through an SQL.

     Traditionally we know of only the ‘structured’ data – the ones that can put into a database.  For the past few years, thanks to the explosion of social media  and smart phones, we have ‘unstructured’ data in the form of text(emails)/SMS, multimedia (audio, video), (A)GPS and other location based data, data from sensors , etc.  that seems to be imploding daily.  These ‘unstructured’ data are the ones that are becoming to be less private because you like to share them across the social platforms and the corporations want to have a strong direct relationship with you based on these data. They want to do everything they can to acquire new customers and retain and cross-sell to existing customers.  If you sneeze, the corporations catch a cold – this is how close they get to you.

    Analytics is the way in which corporates handle these complexities and speed in data to arrive at a business value that gives them the competitive advantage.  Analytics is just an interface between these large data and the business model.   It uses mathematics to derive meaning from data.   Most of the analytics has its roots to Google, Yahoo and Amazon who are considered pioneers in these and the technology being used.  In the earlier days, they just used to work on ‘samples’ or a smaller subset of the data, discarding all the outliers, and do some predictions.  Nowadays, with the availability of affordable storage, networking and computing power and even pay-as-you-use models, all the generated data gets analyzed to arrive at deeper and broader insights.   Since all the decisions are getting to be more data-centric, it is imperative there is proper transformation and cultural change across the corporation in terms of all the people, the process and strong leadership

   Big Data analytics have moved from being descriptive (based on past information using statistics – Business Intelligence to understand what happened) to inquisitive analytics (why it happened) to being predictive (used past information to predict future outcomes- Data mining and forecasting for what is likely to happen)   to being prescriptive (used past information to direct future results – optimization to arrive what should happen).  The world has moved from models created by small ‘samples’ to using ALL the data to create more complex models and simulate evolving scenarios. All these outcomes of information management in the form of reports, dashboards or animated visualization gets up-levelled to the senior leadership team to arrive at some qualified decisions which becomes the baseline for the  way ahead for corporations.    The talent that is required to do all these modelling are essentially a combination of data scientists with good maths, statistics and technology background and business managers with good economics, behavioral science and social skills.

   Cloud is just a means to provide shared computing resources that are pay-as-you-go and in the IT jargon, it is often referred to as XaaS where X can stand for I or H, P, S, etc.  IT services are seen as utilities and one pays only for the time the resources are being used, hence cloud is also referred to as Utility computing.  Infrastructure as a Service (IaaS, Hardware as a Service – HaaS) is the most common of all cloud services that delivers all computing resources on a rental basis, Platform as a Service (PaaS) is a means by which tools and middleware gets integrated with IaaS to provide a comprehensive consistent platform, and Software as a Service (Saas) is an application that gets created and hosted by the developer in a multi-client mode and will sit on top of a PaaS or a IaaS.   Cloud, be it private which means owned and operated by the organization itself or public which means owned and operated by a vendor or hybrid which is a combination of both Private and Public, is essential for Big Data.  Examples of Iaas would be Amazon EC2-cloud Compute service and Rightscale, of PaaS would be Microsoft Azure and of SaaS would be a CRM like  Google has also introduced Data as a Service (DaaS) where one can use the cloud to store and retrieve data.  Cloud computing still has some nagging issues of security, privacy and standardization (or lack thereof) which are slowly falling in place, and the old IT organization and the CIO roles are getting transformed taking this new paradigm into effect.


    There are many Big Data technologies being used but the most common today is the Apache Hadoop framework which is an open-source platform for both storage and processing of all data variants. The two critical components of Hadoop are the Distributed file Systems (HDFS) used for storage and the Map Reduce which does the analysis on the data, both in the distributed sense.

     MapReduce was designed originally by Google that distributes the problem and later aggregates the result in batch mode. Google developed Big Table as their distributed storage system from where Hadoop derived the HDFS.

    Hardware, networking and storage have become more affordable now and are constantly getting cheaper to enable distributed computing in a big way.  Cloud gives you all these through subscription based service, with no upfront capital or maintenance costs. 

     Open source software is key and was made prevalent by Google through its Android mobile OS and is the key forward for any new technologies to be embraced quickly – the eco-system builds up around this open source efficiently and quickly, thus able to deliver all sorts of solutions for a very low cost.   The smaller companies seem to be more agile in delivering a solution for a customer need than the big software vendors and this is creating competition where size does not matter. The software has moved from a classic licensing model to a royalty based model to an annual fee based model thereby benefiting the end user who always has the latest updated version to work with.

   Distributed computing is a fundamental technology that allows independent computing resources to be networked seamlessly together across a huge geographical area to make it look like one single coherent environment. Computing resources that are being shared can include computing entities to memory to networks to storage, but they all have to work together to execute a program.   Over the years, distributed computing has evolved  from mainframe computing where there was a large computer using multiple processors with massive IO operations used for batch and transactional processing, to Cluster computing where several cheap commodity machines were connected by a high bandwidth network and controlled by specific software tool for parallel computing, to Grid computing which is an evolution of clusters where the grids are actually  an aggregation of geographically dispersed clusters connected by Internet and users can ‘consume’ resources just like any other utility.

   Distributed computing can be regarded as a super set of parallel computing, the latter implying a tightly coupled system of mostly homogenous components sharing the same physical memory or shared memory.   Distributed computing encompasses all architectures that use heterogeneous computing elements not necessarily co-located.  The differences between these two types are getting blurred as these two terms indeed gets used loosely to mean the same thing – both are used to perform multiple activities in parallel.    Since in Big Data, the data complexity is high due to its volume, variants and distribution, and the computational complexity may also be high, distributed heterogeneous computing fits well for statistical models, and simulations. Cloud technologies support Big Data well by providing large computing resources on demand, providing large storage for keeping these large data and providing frameworks for optimized processing of large amount of data.

     The foundation of cloud computing is Virtualization that separates the resources and services from the underlying physical system- here again, this logical split can happen at the server end through a thin software layer inserted into the hardware that contains a virtual machine monitor (VMM) or Hypervisor, at the application level to make it OS independent, at the memory level where the memory gets decoupled from the server, at the networking level through a SW that just makes a pool of connectivity available or at the storage level – this level of abstraction that virtualization gives  just provides the relevant information needed and hides the exact details which may not be relevant, and makes applications portable across different hardware and software environment.  Although not meaning the same, this software abstraction is more or less similar to the green-font HW machines called ‘XTerm’ used by DEC and SUN during the 1980s that front-ended for their servers there were at the back for computing. The most common technologies used here are Xen, VmWare and Microsoft Hyper-V.


   Analytics has become prevalent in some key areas now and is slowly changing the way we do business:

Financial-Banking and Insurance – perhaps the prevalent users of analytics and early adopters as well

Credit Card Fraud:   The transaction record of the customer is validated against the customer records and his/her past transactions, their travel schedules (getting access to travel sites from where they did the booking) and place of transaction to identify if there is any abnormal activity, as they are transacting in real-time.   There are certain rules set for each customer based on his/her history that the transaction gets checked against.  If some transaction is believed to be ‘suspicious’, then more ‘verification’ process is added to the transaction to make it more secure.

Credit Risk analysis:  Banks wants to play safe to ensure they can retrieve the loan from their customers – they look at past credit history against your name to see if you are a ‘safe bet’.  Thanks to the credit rating agency like Crisil which does this as their main line of business, the information of all credit transaction of all kind is available to the banks and loan-giver to verify the details and distribute a loan or give you a credit card or line of credit.

Insurance Risk analysis:  Right now, your vehicle insurance premium is based on the city you live, the risk of the neighborhood you are in, and your driving points against you and prior claims made.  In the USA, few insurance companies are generating the premium based on INDIVIDUAL customers and customized to them as a pay-as-you-drive insurance policy.  The onboard telematics sends feeds to your insurer on your braking and acceleration habits, distance you travel, and the roads you frequently travel on (using GPS) – thanks to these various sensorial data, higher premium is charged for more irresponsible driving.  This in turn serves both purposes – makes insurance companies more profitable and also betters one’s driving habits.  A shining example of not only where the ‘rubber meets the road’ but also where the ‘engine meets the wallet’!


     The biggest bang for the buck for analytics, in my opinion, would be in two areas – healthcare which impacts everyone’s life, and in retail to understand customers better.  Healthcare comes today at a cost and is heavily dependent on the facilities of the hospitals or clinics you are getting the treatment in, and the knowledge of the doctor attending to you.  Healthcare is one critical industry like power where the government needs to ensure it is affordable to all its citizens, and at the same time must be the best available there is to all.

       For all this to happen, a good start would be a health record of the patient available electronically across the nation and the globe. This would carry a history of ailments, conditions, surgery and medications of the patient and the regular health check-up results – this is the Electronic Health Record (EHR) available in the US and other countries. The second would be the availability of all clinical trials that are in process or already FDA-approved, side-effects data of all medications, common diseases data prevalent in certain parts of the world and definitely the insurance data of the individual.  With these two together, any doctor from anywhere in the world can give guidance to the best and optimum cure and care for the patient, best medicine from any pharma company for a particular condition, and the best insurance plan for an individual and his family based on the risks they carry. Data drives most of these integrated decisions now, along with the doctor’s experience to suggest a remedy – compare it to the yesteryears where the former data would not be available.  This also further progress into tele-medicine where a solder injured in the battle is in an operation tent with medical gadgets streaming data to experienced doctors sitting elsewhere to guide the surgery procedure and to have him get out of danger quickly.


     All your purchase patterns and transactions are being collected and analyzed carefully to send you targeted advertisements with e-coupons, to aid companies do location based marketing, to help companies get data on leaving customers and where they are going to and why, in managing the effectiveness of an ad campaign, and in knowing details of acquired customers to improve cross-selling.  The better they know the customers, the better would be their sales in an industry with thin margins.

      The other areas in retails where big data analytics is already in use are in inventory management, logistics optimization, merchandize assortment and pricing optimization, fraud and loss prevention and vendor rationalization.

    Classing examples of analytics are Amazon “you may also want” prompts and Netflix “what your friends thought” of movie suggestion, both of which shows good results for the retailer.


      Many of the travel sites collects the log files from all the searches made by the users, and based on your desired preferences will strive to increase their bookings ratio. They would also have data from the text analytics report from your TripAdvisor reviews and based on what you like and do not like, and based on your past history on their site and other sites, will be able to give out optimized  flight and hotel options taking together the inputs you had given based on budgets and time.


      Volvo along with Sweden’s Transportation department  is using cloud service for car-to-car communication to warn the drivers ahead of icy and slippery roads, thus making safety a priority.  They collect the data from the sensors (ESPs) fitted inside their cars – ESP stabilizes the car as well as sends signals of hazardous road conditions through the mobile network to the cloud.  This real-time information is shared with the cars behind that are to use the same road so that they are pre-warned about the actual condition of the road and this information compliments any blanket weather warning that the drivers automatically get updated on.


   Major part of advertising is the reach and conversion that one gets through any forms of media, be it mobile, TV, Web or the classic print.   Advertising is what brings money to the media houses.  Despite the numerous ads that come on any websites, only a few gets clicked and only a small percentage of these clicks actually turns into a purchase. The marketing world is always challenged with how an ad can be more effective so that the hit ratio increases. Now with the digital cable and dish TVs clearly revealing your viewing patterns, your online purchases and shop transactions revealing your buying pattern, with the website having a history of your visits in some format, and with the operator knowing what Value-added services you have enrolled in, and with the world knowing what paper you read, all these combined through analytics would clearly describe a ‘path-to-purchase’ pattern to enable the media houses to focus their ads appropriately. It would not too long before ads stream into your TV or mobile that is customized based on your likes.   

    We already have News websites that customize your viewing page automatically based on your interest as this data is already collected and analyzed based on your previous trips to the website.


   The business problems that get tackled here through analytics are classified into three buckets: 

  • Sales and Marketing to understand their sales force effectiveness and resource optimization, market assessment and competitive analysis
  • Research and Development for clinical trials and reporting to FDA, safety analysis for the product, and licensing
  • Pricing and contracting for inventory and logistics management, and for setting up contracts and buybacks and rebates etc.

     The other applications that are prevalent, some of which are being used by you daily without being aware they are Cloud based ,  are Google Docs, Gmail and Yahoo Mail, wearable health devices that has sensors that routinely monitor vital patient data and feeds back to the hospital or doctor who can take action based on any anomalies immediately, gene profiling and protein structure modelling that was done using community cloud from research institutions, use of satellite image processing used by several countries now for natural disaster management, opinion polls during elections, online document storage like Dropbox of iCloud by Apple, all the social networking sites like Facebook and Twitter, online gaming and casino gambling predictions.

Transformation in the future

    How do you feel if some complex tool used by a company predicts your next behavior with reasonable accuracy?  How can companies use the data you provide and analyze them to make you BUY?   How can healthcare be more focused to your particular problem and provide the best care at the cost you want? How you get the best travel package suited for you and your family based on your likes and dislikes that would enhance the memories of the travel?  How can your insurance be tailor made for you based on your own defensive driving habits and your history of no claims? How can the banks give you the best bang for your buck by automatically understanding your financial goals and delivering a better return for you as a privileged customer?  How can airlines make you fly with them frequently by enhancing your particular travel experience every time?  

     Big Data and its associated analytics are used to take on each customer as a time and enhance their experience.  We can still use the old route and use the 80/20 rule that says that one can easily draw effective 80% of the conclusions and decisions based on the top 20% of the overall customer data. The choice is clear.


  • Big Data, Big Analytics – Michael Minelli et al , Wiley, 2013
  • Big Data for dummies – Judith Hurwitz et al, Wiley, 2013
  • Mastering Cloud Computing – Rajkumar Buyya et al, McGraw Hill, 2013

Many thanks to the reviewers of this blog and their valuable feedback – Vishoo, Venki  and John, all of them from either analytics or e-commercebackground.

First thing for Autonomous vehicles : Proper annotation

As the Driverless or autonomous vehicle buzz is going around, with almost all car manufacturers trying something on their own to create enough news around their activities in all the car shows, we obviously see more Driver Assistance coming up in driving, parking, etc.  A fully autonomous car plying on your roads, and believe me, this would be restricted only to a few countries where good driving habits exists and huge penalty is incurred for any traffic infractions, would follow a series of fundamental steps (there would be more than three but for our discussion I am limiting it to three) to make this happen over the next few years:

(i)                 Put a camera or multiple cameras on a car (called host car or the EGO vehicle) and drive around the entire town and outskirts mapping all possible videos and images

(ii)               Annotating or tagging or marking those images to give the objects seen in those images some consistent meaning (video would have lots of frames and each frame would have lots of images which includes objects outside of the host car) which becomes the data for future processing – this is a manually intensive task, and

(iii)              A distributed computing environment working on these large datasets to get trained (this is there the Artificial Intelligence – AI kicks in) and scores of models developed for various situations for the cars to interpret and make decisions on the fly by recognizing such objects and patterns.

On a granular level, any tagging would have to be done minimally (if not more) in three different groups:

(i)                 Static – just the road signs, road lane boundaries and foot path boundaries, everything out there that is limited to the road and parking spaces.

(ii)               Dynamic (moving) – over the static, other than the host car, it must be able to identify others vehicles, pedestrians and even animals (in certain countries where they freely roam on the road), and

(iii)              Semantic or Contextual (immobile) – like the objects that are part of a scenery like hills, sky etc.

    Having said this, all these activities are very time consuming and although there exist automatic annotation tools, its accuracy levels are so low that all the OEMs prefer to do manual annotations of all the images by themselves. Consider this against what Google can provide in terms of the 3D building view, its GPS coordinates that shows the road conditions across various seasons, its mapping facility and all the optimized algorithms associated with it, and we are slowly conforming to an entire picture set that every car would have that would be constantly changing every day. For Ex., some road gets dug up for some wiring or plumbing, then this has to be instantly disabled across all the autonomous cars plying so that there are no driving errors associated with this. Also depending on the road conditions, the lanes can change or be accommodative to changes which has to be instantly delivered into the brains on autonomous cars. Weather conditions, accidents, light conditions and intended blocks have all to be captured in real time, and hence this data has to be refreshed frequently every hour.

     I am sure with all this complexity, Google is sitting pretty to address this problem once for all, and be there to constantly give us the updates and sell it as an irresistible package that every car manufacturer will be able to consume down the road (pun intended intentionally). When it comes to autonomous cars, it is all about navigating (mapping) and positioning (absolute and relative to others) and both needs to be absolutely precise at any given point in time.   If the label is wrong, then there are serious consequences to the same (fatalities can happen if the trained car missed a cyclist in the frame!).

     Training these data to proper information that the car has to act on involves zero tolerance to errors as safety is the critical need here and so this machine learning using neural networks would have to be done rigorously and tested across various parameters and conditions before any logical usage of it in real time.  The output of the computing would be as good as the inputs one gives into the system.   In fact, it is also imperative that one may need to redo roads to accommodate autonomous driving, and this would include lots of capital expenditure by countries and human labor to do this.    The entire transportation system would become more intelligent and automated.

    All these data being captured by video, even in these times of fast internet, get hog the bandwidth and hence they would require co-location of servers to store and compute. Pretty soon, once autonomous cars are in place, it would not be too far before Traffic Robots take over a policeman manning the streets.

  Having talked about the annotation tool software, when it comes to licenses, there are two major criteria for choosing them: Duration and usage or users. First for the duration of the license, the software vendors would either have it temporary for a given period (say 6 months or a year) which needs to be renewed in full every period after it expires, or perpetual where the license is of indefinite period, usually all updates are free for the first one or two years and after which one needs to pay a fraction of the entire license cost every year. Second is how the usage of these software can happen which means either it is node locked to a client machine or floating in a server world which is nowadays usually in the cloud from where a license key is fetched. So, it is the case that a temporary license may work out more expensive than a perpetual for a longer period of use, and a floating license is usually more expensive than a node locked one. Usually annotation projects encompass many annotators using the tool, over more than 2 years as the amount of video feed is in the orders of TB every week. So, it would make sense to go and negotiate the best rate for a floating perpetual license from the vendor.

  All these are distant dreams that may not see the light at the end of my day. But still the research and the work resulting of autonomous cars would get applied into many other functions around you which would make life easier, I suppose.