Note: This post is going to be very technical, but will raise important questions about the redesign of the payment system used for Medicare Part A patients in skilled nursing.
Previously I wrote about the proposed RCS system and I included the image below from the technical report. I jokingly captioned it with "Someone explain this to me. Please. Seriously." Since that time I have spent some time reading the technical document and trying to understand how Acumen arrived at this table.
The technique is ... interesting. Let's take a look. The technical document is here. I am more or less starting on page 41.
What to Use
It appears that Acumen decided ADLs would be used to help determine PT/OT expenses from the very beginning of the process. It is not at all clear why. Given that, step one was to determine which of the ADLs best predicted PT/OT expense. I'm including the table below. Based on that analysis, Acumen concluded:
Three late-loss self-performance items were selected as the best predictors of PT/OT utilization and the most clinically appropriate indicators of resident function: transfer self-performance, eating self-performance, and toileting self-performance
Okay, let's stop here a moment. Based on those r-squared values are you prepared to say which 3 ADLs are the best predictors of PT/OT expenditures? No? Let me help:
Personal Hygiene actually has a greater r-squared than toileting and toileting is only the tiniest bit (0.002) higher than Walk In Room*. Yet somehow we emerged with Eating, Transfers and Toileting as the best predictors of PT/OT spending? No particular ADL stands out in that graphic to me as being superior to the others. Fortunately it doesn't matter. Why?
These r-squared values are screaming: "These metrics are not good predictors of PT/OT costs!" That means it's silly to argue about which ones to use. The correct answer is: None of the above. The report touts these numbers as "Significant at the 1% level". That's because you used a huge amount of data to do the regression. What's that's really saying is we are highly confident that ADL scores are a poor predictor for therapy cost.
I think part of the problem is poor understanding of what ADL scores actually are. It's easy to look at an ADL score, recognize it as a number and start doing regression analysis on it. ADL scores are not metric! That means they are not continuous numbers. The extensive category isn't "1 more" than the limited category. In fact, the extensive category is huge compared to the others, which is one explanation as to why there are so many people in there. ADL scores are ordinal. (They're probably nominal, but I'll settle for ordinal.) A more appropriate technique would have been ordinal regression or Bayesian analysis. Regression analysis is not the appropriate tool for this job.
Practical Reasons ADLs Aren't the Answer
The truth is ADL scores have tremendously bad resolution on patient acuity. Don't believe me? Consider the following:
- ADL scoring rules are often poorly understood by those who document them. The best ADL scorers I know are trained on ADL scoring often. Even with all the training, scores will drift towards Limited Assistance over time. Why? Something very similar to what game theorists call minimax.
- ADL scoring is not done consistently throughout the country, far from it. Some people use software that forces users to choose between categories they don't understand. Some software asks questions "interview style" that help arrive at a score. Other facilities score by hand, on paper, in 2017. (How they stand this I do not know.) Some facilities really focus on ADL scoring, others don't. etc. So that means limited assistance means different things in different facilities.
- Perhaps most importantly, facilities are financially motivated to produce high ADL scores. Some facilities understand and push for higher scores, some don't.
- Also, it's easy to get high scores because: The bar for high ADL scores is incredibly low. It only takes 3 occurrences during a 7 day, 3 shift per day week to get extensive assistance. That's just not very difficult. Most savvy operators know extensive assistance happens very often, it's just a matter of recognizing it and documenting it. Those who do this get paid more. (If you follow the scoring rules to the letter you get paid the most. It's so simple but some people don't do it.) I've seen a patient get an extensive score in the first shift after they arrive in the building.
Ignoring those problems for now, let's continue through the document.
How To Use It
Next in the analysis, Acumen averages PT/OT expenses for each ADL level for the "Chosen 3" and assigns an entirely new rating scale based on average expenses. I'll call these new points "magic" because they appear out of nowhere. Here are the problems with doing that:
- The data you're using to plan for the future is going to change once you implement a new system. ADL scores currently drive reimbursement. Once they don't why should we expect the scores to look the same?
- This is essentially adding another layer of obfuscation to an already complex system. Limited Assistance (a "3" in RUG IV nomenclature which is worth 2 points) would be worth 6 magic points IF we're talking about transfers or toileting. It's a 3 magic points for eating.
- There is no mention of the standard deviation within these groups. Limited assistance seemingly correlates with higher PT/OT expenses but how much overlap is there? It's hard to know how arbitrary magic points are without seeing the data.
- The system doesn't pass the common-sense test. For example, in Toileting, people who rated as "Supervision" had higher PT/OT expenses than those who rated "Extensive", yet Extensive gives me one more magic point than Supervision. Why? I don't know. Are the standard deviations so high that those two are essentially the same thing? Did you just want to match Transfers? Are we concerned with how silly this is going to look? This point illustrates how bad ADL scores are for predicting PT/OT expenses all on its own.
This is the point were it turns from frustrating and sad to serious and expensive. I have consulted with my friends whom I consider to be experts in ADL coding. (This is non-scientific I admit.) To a person they tell me that if they stopped training staff on ADL coding, then in the long run, scores would migrate to... you guessed it, limited assistance for transfers and toileting and independent for eating. (I mentioned earlier this phenomenon looks a lot like minimax.) Those happen to be the ones with the highest magic points and hence the highest payers. At least with the current system you have to do extra work to get ADL scores to be higher (and more accurate). With this system entropy drives reimbursement. Where do I invest?
There is also no mention of what ADL scores do over the course of a resident's stay. Up? Down? I don't know because the data Acumen used is not available to the public. If we're going to use the ADL scores from the 5 day assessment to set the pay for the entire stay, I think it's fair game to see that data so we can ask good questions. (Although I am not sure what you could show me at this point that would convince me using ADLs to set reimbursement for PT/OT is a good idea.) (Note: page 47 claims there is more information on RUG IV ADL payment thresholds in Figure 15 in the appendix. I don't see that data. Figure 15 is a flow chart.)
I don't think I'm the only one who lacks confidence in this analysis. I think the authors do as well. Bed Mobility could have easily been excluded from the mix by simply pointing to the results of the regression analysis. Instead, the authors cite an unknown number of clinicians who said the measure is based on environmental factors that are not consistent between facilities. While I agree with that, why bother with that argument? Doesn't the same argument apply to every other ADL to some extent? We're in the awkward position that we need to trust the regression analysis, but not too much or we'd have to switch to Personal Hygiene.
It seems clear to me that the decision to use ADLs happened very early on in the process, perhaps prior to Accumen even getting involved. Either due to fiat or extreme inertia, no one stopped to consider the idea that this is a square peg/round hole situation. I question whether the function score is even necessary.Clinical category and cognitive impairment might give plenty of resolution. Even if they aren't, ADLs are not the answer, especially this strange, modified version of them.
Is There a Fix?
That's a great question. An ideal system would be based on:
- the reason the person arrived in skilled nursing and the overall state of health at that time
- the outcome given number 1 and the realities of life
RCS makes an attempt at number 1. The approach is pretty good regarding the use of clinical categories and cognitive impairment. The ADL part of it should be shelved.
Point 2 is not covered in any way by RCS. That's a bigger Achilles heel than ADL scores for certain. That's the subject of my next post.