Queueing theory and Utilization factor in Services Company

       I recently read an interesting article on Queueing theory and the analysis of server utilization.  This theory has been there for a long time and properly researched mostly from the hardware angle.  An idea struck my mind – why not I extend this theory to the utilization and billability concept that every professional services company talks about and see if it makes real sense for them to achieve greater than 90% utilization of people resources.

Let me introduce you to the common referred graph for server utilization.   Every company has a Human Resources department which means that all employees are treated like resources.  This very concept is so appalling to me as I prefer to use “Talent Office” because every employee is a talent for the organization – we live in the days of Knowledge management.   But let me not digress and let me introduce you to the concept on which I would make some inferences later.  This graph denotes that the wait time just goes up badly as the server utilization goes beyond 80%.

The X Axis is server utilization and Y axis is the average wait time.   At a utilization of 1, the average time is actually infinity. Wait time here means that the jobs that arrive at the server have to wait that much longer to be processed. You can even apply it to any retail store format regarding people wait time where when the numbers of customers arrive faster in the line than the cashier can actually serve and be done with each customer, the line becomes longer.  The same logic applies to fine dine-in restaurants – you see a line outside because all the tables and the waiters inside are complete engaged with present customers who take a significant amount of time to get served and eat.   I will not get into the semantics of how to classify these types of time.

Now let us split the total amount of time to two segments – the actual wait time to get served, and the actual service time where your job gets executed.  Both of them together are the response time – actually this would include a variability effect as well.   To highlight the impact of the wait time, let us assume the service time remains the same for all jobs or customers which may be incorrect but let us do so for this article.  And we are talking averages here throughout.  The response time degrades the busier the resource is.  There would always be a safety capacity plan to support in excess of expected demand to cover for the variability effect.

Some observations here:

  • If you are the first job or the first customer, your response time is equal to the service time.
  • If your resources are utilized at 50%, you can see the response time doubles!
  • At 75% utilization, the response time quadruples and at 87%, it becomes eight times.
  • At 94% utilization, the response time is 16 times and at 97% utilization, the response time is 32 times.
  • When you go from 75% to 90% utilization, your response time increases by 225%!

So it is obvious that if the machine or the resources are busy, there would be lot of jobs waiting in the queue.   Now let us correlate it to what every professional services company uses as a key objective – its billability and utilization.  Without going too much into details, billability rate is usually less than the utilization rate – meaning not all what a resource does for a project can be billed. Utilization is just how long a resource gets occupied with a chartered project and billability is how much of that can the service company bill the actual customer. Ideally they would try to make them equal which is a fallacy and hence usually it is anywhere between 80% and 90% of utilized time becomes billable to the customer.

What are the usual metrics that the services company would go after?

  • Billability ratio of 90% and utilization of 100% ideally.
  • Lots of customers and new clients added per quarter which necessitates adding capacity (more resources) but to keep the cost down, they would maximize hiring freshers and manage every project with a hierarchy of varied-experienced employees.

Now let us prove how these metrics may not be the right ones to use.  Let us analyze from the bottom and try to see how much of time does an average employee have for a project.  Since this is a high level view, with each company having their own way of operation, let us just use this as an example for our story here:

  1. There are 52 weeks in a year.
  2. The vacation days (holidays and vacation put together) can be anywhere between 4 to 6 weeks depending on which geography you are in. So, effectively one would work for only 46 to 48 weeks in a year.
  3. Let us assume a barest minimum of 1 week set aside for training each employee with new skills.  Usually this is more.  Now we are left with 45 to 47 weeks of possible project work time.  In all honesty, most of the employees do not even get trained for a new skill ever!
  4. There are months where there are more orders and there are some with the normal orders.  Hence these employees would be dragged to more critical projects for a week or two, on a high priority need basis. So, we have about 43 to 45 weeks left.  Again, these are minimal numbers estimated.  So, this 5% (~ 2 weeks) is to adjust for the variability effect.
  5. Let us assume we work on projects 4 days a week of five official days. Meaning, your billability is only 80% of utilization.  You may be attending some process meetings, some team meetings to make the project work, some lectures and talks, some business meetings etc. which cannot be directly billed to the customer. Assume now that we have taken off 9 weeks for non-billable work, out of 43 to 45 weeks.  One is left with only 34 to 36 weeks of project work.
  6. 34 weeks out of 52 weeks total is about 65% (36 weeks would make it 69%).  Let us assume it about 2/3rd for calculation purposes. The employer still has to pay you for the 1/3rd that has to be accounted for in terms of salary. The other way of saying this is that a services company takes a loss of a third of the total salary paid which they have to recover.
  7. For our calculation again, let us assume 35% instead of 33.3% for this loss. If one take’s more than a week for training, and if one gets dragged into other projects more than the two weeks allotted, then this number actually increases to 40% and more. So, to recover that 35% of your salary, the company bills the customer 35% more than the salary (let us say this is the average salary for the company). This amortized over the 260 working days possible would give the company the contract rate to the customers.   Of course, this would differ from small to big based on the experience of the employee but again, we are talking averages.
  8. No services company runs for charity – they would try everything to make money.  Not only it does not want to take a loss, they have to make profit. This is a commercial venture. So, let us assume 20% as a profit margin a company would expect.  So, the services company would charge 1.55x average salary and derive the per day or hour contract rate!  Again, if you are an accountant trying to read this, sorry, I may have simplified it a lot to give brevity.
  9. Now the company has lots of overhead that they cannot bill. These could be support functions, enabling functions and even managerial cadres. Let us assume for every 10 employees there is a manager who cannot be billed. Now his salary has to be amortized over 10 employees.  His or her average salary would be 2 x employee salary.  Now this salary has to be added to the contract rate which makes it 1.75 x average salary!   Yes, there are differences you would see in these calculations (albeit minor) if you are mathematician and can definitely pick holes in my derivation but this is close enough.  Close enough is good enough for an engineer!  So, if the company is paying $10 for any BILLABLE employee on an average, they would charge a minimum of $17.5 to the customer!

I pointed out these calculations to make the audience understand at a high level how the accounting works (but there is more to this than what I have written) but may not directly relate to what I am driving at.

I am assuming 34 weeks of 100% utilization for every employee based on the above analysis.   Now the expectation for all the employees to be 100% efficient in these 34 weeks with no slack whatsoever.   Borrowing from the machine utilization analogy, the following things can be inferred:

  • The slowest resource is going to derail the project – this is the critical path ideology that every project manager would practice because of Theory of Constraints. Given this is the scenario, the slowest resource efficiency cannot be equated to the normal or faster resource efficiency.  Yes, the slowest resources are working 100% of his 34 weeks but his output is lower than the others! So, as a project manager, add resource to compensate for this thus adding cost, or push schedule and eat your TIME buffers to meet the customer milestones.   If the slowest resource is the long pole in the tent, it does not matter if others finish faster – they all need to wait for this resource to complete!   Plan for lower utilization to address this factor.
    • If something goes wrong in between, there is no slack to compensate now. Since employees are on the edge with 100% utilization, this automatically means nothing wrong is allowed to happen nor all risks are naturally mitigated so that there are no surprises.  Living on the edge can never be right.  It causes stress and stress leads to lower quality of work.  So, even in those 34 weeks, it is quite natural for the project managers to add a safety cushion for a said demand (remember, this is NOT a variability effect cushion which is based on demand)
    • Not all customers are alike nor all projects the same.  Some customers are more demanding than others, and some projects are more complex than others.  So, the response time of execution of any project is going to increase based on higher demands of a customer or due to more customers, if the capacity of resources remains the same!  Service level agreements are likely to suffer.

   With all the above factors, it is extremely difficult to get to a consistent long-term utilization of 90% and above, which would automatically means the best billability would be in the range of 75% to 80%. For the services company to stay competitive on rates, they need to reduce their operational costs and develop a decent hierarchy of experience that can still deliver high quality solutions.  It is always about quality assurance during design rather than quality control after design!    Continuously improve the process but understand there is an upper limit in utilization for your highest efficiency and this is not even close to 100%. Even if you are pushing the envelop, the incremental benefits in revenue you may get from pushing the utilization (and hence billability) from 90% to 95% or more would be definitely negated by the impact it would have on quality and customer relationships.  Remember, there would be penalty clauses at high as 20% from every customer if the deliverables are not achieved within the specified time.

       Remember, we have been just talking about averages here and not even maximum which would be worst case.  Also I have not even accounted for attrition which is usually between 8-20% for any services company which would translate to delays in response directly.   Going from 85% to 90% and beyond does have a bigger impact on response time. For a services company to deliver with good quality at all times, the target thus has to be preferably defined between 80 to 90% only.  Nothing more.   If someone is saying they have a 100% utilization, it is high time we do a due diligence audit on the calculation!