What I Learned From Rolling Out A Career Leveling Framework

I recently rolled out a career leveling framework at Upwave, the company where I’m currently the Director of Engineering. It was my first time creating or releasing this type of framework and it was definitely an informative experience.

I want to share some specific tactical lessons I learned from the process. The overall goals and design of the framework itself are a large, deep, only semi-related topic so for brevity I’ll mostly skip that and focus just on the lessons I learned from the roll out itself.

If you’re interested in seeing our final framework template, you can find it here: Upwave 2021 Career Growth Framework. And if you’re interested in diving into the design of the framework itself then let me know and maybe I’ll write something more in-depth.

Background

The rollout took about a month, but the conception and creation of the framework took nearly two years. I started thinking about career leveling about a year after I first started managing Upwave’s data team (probably right after completing a set of “semi-annual reviews”, which at the time we did ad hoc and on individual team members’ semi-anniversaries).

Inspired by frameworks like Rent the Runway’s and CircleCI’s, I put together a draft leveling matrix. You’ve probably seen the type, with career levels on the columns and attributes like “reliability” or “collaboration” on the rows.

I trial-ran this early draft matrix on one of my direct reports at the time and it didn’t go well. I presented my line-by-line ratings to him and he objected vigorously to several of them. Overall, the conversation ended feeling more contentious than productive. At the time, we lacked any clear pressing need to have a leveling framework at all and so I chose to put the framework on the shelf rather than revise it or barrel ahead.

Time passed and I eventually Upwave’s Director of Engineering, and thus I became responsible for our whole engineering team. Our team spans the full spectrum of experience levels (from ~2 years to ~20+ years) and the full set of software subspecialties (web, devops, and data engineering) of a typical SaaS startup.

While I have always given constructive feedback to all of our engineers on a regular basis, as the team has grown it has become increasingly challenging to make sure that the feedback I’m giving fits into a cohesive narrative about how each engineer should be investing their efforts to advance their careers.

That’s exactly the type of problem that a career framework can help with, and so over the last ~4 months I dusted off my first draft and started using spare slices of time between meetings to refine it.

Come January, my CTO and I agreed that it was finally time to roll it out.

The Roll Out

We have a biweekly Product Development All-Hands, so I used that meeting to deliver a ~15 minute presentation announcing that the framework (and an initial leveling-setting cycle) was coming and explaining how it would work.

I then created a copy of the matrix spreadsheet (from a template in Google sheets) for each engineer and gave them ~2 weeks to fill out a first draft on the sheet (i.e. by giving themselves a rating on each attribute).

I then asked them to schedule a live 1-hour meeting with me to talk through their self-assessment and for me to share any adjustments I needed to make (along with my reasoning). The final set of ratings implied a final “score” that determined each engineers’ level.

I was able to complete 95% of these conversations within ~2 weeks, with one of them getting delayed another 2 weeks by scheduling conflicts. I finished that last one last Friday.

Overall, I’m quite pleased with how everything went. The conversations were by-and-large amicable and productive, with none of the engineers expressing substantial surprise or dismay at the outcomes (even in cases where I had to down-rate them on several attributes, which in some cases suggested a lower overall level than their self-assessment would have suggested).

It’s too early to draw meaningful conclusions about whether the framework will actually accomplish my primary goals of:

  1. ensuring that we recognize and compensate our engineers as fairly as possible, based on performance and not on characteristics irrelevant to the job.
  2. giving each engineer the clearest guidance we can about what specifically they’ll need to improve in order to achieve the next level of impact, recognition, and rewards.

But even in the several weeks since I completed the earliest of the conversations, I’ve already seen what looks to me like a marked improvement within areas that certain engineers and I together identified as primary opportunities for growth.

Lessons Learned:

1. Rolling out a leveling framework can come as a big, anxiety-provoking surprise

In retrospect this should have been obvious, but since leveling frameworks directly impact peoples’ career trajectories they’re understandably perceived as being high-stakes. They also carry a lot of baggage based on how every other job your engineers have had have handled evaluations, titles, promotions, etc.

As soon as you start talking about titles and levels, people’s minds will go back to their past bad experiences and the onus will be on you to assure them that you are not about to repeat all their past managers’ mistakes.

I made the mistake of announcing the leveling framework during a meeting that already had a packed agenda. There were quite a few questions following my presentation and I took the time to answer them, but that ended up forcing the next presenter to compress his material (an obvious faux pas) and also meant that the Q&A period felt rushed and didn’t afford us time for the quieter engineers to reflect and formulate questions.

I don’t think there’s any great way to announce this kind of thing without provoking any anxiety. But two things I would have done differently are:

  1. Make sure to present the system in a meeting with plenty of time for discussion (and to call a separate designated meeting if I couldn’t fit it neatly into an pre-scheduled one).
  2. Telegraph the system in an initial meeting, saying something like “I wanted to give you all a heads up that I’ve been thinking about a career leveling framework. It’s still in beta mode but I hope to be ready to share my details in 2 weeks. Please let me know in our next 1-1 if you have any thoughts you’d like to share, e.g. about how growth/leveling was handled well or poorly at past companies.” And then listen to input, adjust as necessary, and give a follow up presentation sharing the details of the system when they’re ready.

2. Changing the Framework Post-Release is Hard

Writing a career framework from scratch is a lot of work. There are a lot of details that go into making a complete, usable framework. And it’s easy to spend a nearly unlimited amount of time endlessly tweaking e.g. the descriptions of what it means to be a “level 4 debugger”. Unfortunately, unlike software where you can (and usually should) ship an MVP and then iterate, leveling frameworks are hard to iterate on once they’ve been released.

There are some silly reasons why making changes is hard, e.g. that Google sheets has bugs that make it kind of a nightmare to maintain formatting consistency between copies of spreadsheets.

Then there are very good reasons why making changes is hard, e.g. that if you change the definition of a Level 4 debugger, someone who was previously a Level 4 debugger might now suddenly be a Level 2 debugger. And that person, if they had just barely made it to Level 4 overall, might now by your new definition come out as a Level 3 overall. Unfortunately for both of you, demotions are horrible (see next point) and so you’ll have to make a plan for what you’re going to do if your iterating negatively affects peoples’ levels. And even if you don’t actually change an implied level, you may have just wasted a lot of an engineer’s time by devaluing a skill the engineer may have just invested a great deal of effort improving.

While there’s no way you can get your framework exactly right the first time, you really should try to get it as right as possible so you can minimize the changes you need to make later.

3. Your Framework Musn’t Demote People

Demotions are horrible. They’re personally humiliating to the person being demoted. And they put management into the lose-lose choice of cutting the demoted person’s pay (which could be economically ruinous to them, if they have e.g. a mortgage based on their prior income), or to leave them obviously overpaid relative to peers at their new lower level.

Luckily for me, we had no formalized job titles before this rollout. So when I set Level 4 as “Senior Engineer” and made Level 4 a moderately late-career level, I didn’t have to go around and tell anyone that they needed to remove “Senior” from their job title.

But the very first question I got during my roll out presentation was “what will you do if you make a change to the definitions and it causes someone to lose a level?”

That was a very good question that I was not adequately prepared to answer. I was prepared to (and did) say “we’ve designed the framework to be parsimonious with ratings, specifically to minimize the odds that someone will get over-leveled.”

The part I was not prepared to say (but now believe is the only viable answer here) is “no one will be allowed to lose a level because of a change in the framework, even if the adjusted framework would imply that they should”. I prevaricated on this point because I hadn’t confirmed with my CTO that he’d be comfortable with that commitment.

But on reflection, I think that demotions are just so bad that you should never do them under virtually any circumstances. You especially shouldn’t do them because you the manager made a mistake. If you screwed up and over-promoted someone, the company should pay the literal and figurative price of supporting that person at the higher level until they (hopefully) grow into the higher set of expectations. On the plus side, if your framework is reasonably granular and you’re only making minor tweaks, then the only people who’d be in danger of slipping a level should be people who only just barely got promoted anyway. So it’s a reasonable bet that those people will organically cross the new line soon anyway and that their over-leveling will be short-lived.

You may be tempted to try to avoid this problem by making your framework more vague, thinking that will allow you to make mutually face-saving “fudges” to preserve people’s levels when you make a change. Don’t do this; vagueness in the standards undermines a large portion of the value of these frameworks (see next point).

4. Worry about Clarity, Not Complexity

My current version of our framework has ~33 attributes that apply to any given engineer (exactly which attributes apply depends on their specific engineering specialty). That sounds like a lot. Most other company’s frameworks I used as models had 5-10 attributes.

An intermediate draft of my framework had ~40 levels per engineer. In the final revision, I got cold feet, worried “this will confuse / overwhelm people” and cut/consolidated down to ~33 levels, and worried that might still be way too many.

That was a mistake — I wish I’d stuck with more levels.

Why? Because granularity allows clarity. Because I had so many attributes I could make the definition of “Level 3 reliability” or “Level 5 code reading” very specific and concrete. Those specific definitions made it relatively easy for engineers to run through each attribute and pick an appropriate rating.

In fact, the average time for an engineer to complete their self assessment was probably ~30 minutes. None of the engineers described the framework as complicated, overwhelming, or hard to understand. Several actually described it as faster and less burdensome than “traditional” “write your own review in prose” type performance reviews they’d had at previous employers.

When engineers did have trouble self-assessing on a specific attribute, it was generally for one of two reasons:

  1. The definition was too composite. E.g. I originally had “verbal communication” and “presenting” as separate attributes, but I consolidated them and ended up with level/attributes like “listens patiently and to understand; guides group conversations; comfortably gives clear topical presentations”. Some engineers clearly satisfied the first two points, but had not yet had the opportunity to give a presentation. So it was unclear what percentage of the points in a description they could miss and still hit the level, especially when the points they were missing seemed relatively minor. In the above example, if verbal communication and presenting had still been separate categories, I wouldn’t have needed to downrate the whole verbal communication attribute just because of limited presentation skills.
  2. The definition was too vague. E.g. I had one level definition along the lines of “writes code to solve complex problems” — what defines “complex?” Or I had another along the lines of “owns a medium size system” — what defines “medium”, or distinguishes a medium sized system from a major system? These terms are genuinely hard to define, but if you want to assess people on these attributes (and I bet you are assessing people on them in practice, whether or not you’re doing so explicitly), you’ve got to find a way to define your terms as clearly as you can.

This last point of vagueness is still an open point of concern for me. As long as I’m doing the sync-up assessments myself I can “voice over” the necessary definitions and provide clarity. But once we scale more and have more managers conducting these conversations, I will need to provide more explicit written definitions in order to ensure we’re apply consistent common standards among managers.

5. Start With A Self-Assessment

I mentioned above that my first trial run of my system with a direct report did not go well. A major issue there was that I set my ratings first and presented them to him without soliciting any self-assessment from him first. That left me blind to moments when his self-perception was out of line with my assessment of him.

Ultimately, I’m the manager and right or wrong, my assessment is the one that’s going in the final record. But I realized that by asking engineers to “go first” by doing a self-assessment, I could have an early warning when I was about deliver a rating that was going to be contentious. That allowed me to 1) prepare before the sync-up meeting, to gather as much evidence as I could to support my rating, 2) be tactful and empathetic in how I communicated my disagreement, 3) give the engineer time in advance to dredge their memory for examples that supported a given rating (which they could provide in writing in advance, or have handy-to-mind to discuss in our live conversation).

Remember that a primary goal of the whole exercise is to help the engineer grow, and engineers are less likely to grow if they walk out of a discussion furious that they were short-changed by my rating of their data management skills. Anger and resentment can easily short-circuit the parts of our brain that we need to in order to listen to and carefully consider feedback. If you can make your feedback easier to hear without materially compromising the essential fairness of the system then (within limits) that’s a reasonable tradeoff. Moreover stress can impair memory, and these conversations are unavoidably stressful even for engineers who are doing well. So the less you can rely on either your or the engineer’s in-the-moment memory to provide examples/evidence for discussion, the better.

While you want the conversations to be a two-way dialog, be particularly careful about letting the most confident people badger you relatively more often into higher ratings – you still need to keep the system fair even for the engineers who aren’t inclined to self-advocate.

6. Focus on Demonstrated Performance, Not Potential

Okay this wasn’t something I learned, but I think this is a key part of a successful framework. You cannot accurately predict people’s potential. Sometimes (like in an interview), you have to try and hope for the best. But you’ll often be wrong. And worse, you’ll often be wrong in biased ways (e.g. overrating the potential of a junior engineer who reminds you of yourself, and underrating someone from an underrepresented background that’s dissimilar to your own).

You do not want to find yourself in a debate where you’re telling an engineer that he would not (hypothetically) succeed at refactoring a major system, or that he would not (hypothetically) be successful collaborating on a cross-team project. You’ll find yourself implicitly (or explicitly) asking people to believe your judgement of them, and they won’t (and maybe they shouldn’t!)

So don’t put yourself in that situation. Instead, rate people on what they’ve concretely demonstrated. “You have not yet worked on a successful cross team project” is much harder to dispute than “you would not successfully collaborate on a cross team project”. And if someone responds with “well, actually I did collaborate on Project X with that other team and it went well” then great! You’re free to change your rating on the spot in light of new evidence.

Tell people ahead of time that you’re focusing on demonstrated performance. Rate people only on demonstrated performance. If someone gets a lower level than they’d like because they didn’t have an opportunity to demonstrate a certain proficiency (which becomes a bigger issue at higher levels, where opportunities for “high impact” moments are more rare), then work with that engineer to make a plan to guide her into the necessary opportunities to demonstrate what she needs to over the next six months.

Because discussions of demonstrated performance vs. an explicit set of clearly defined standards are concrete, specific, and factual, it’s harder for them to become too contentious (unless your definitions are too vague — see above).

(You may notice an intentional connection here to the idea of only giving feedback on behaviors “a video camera could capture”, or of the idea of “speaking objectively / inarguably” in Non-Violent Communication).

7. Framing Matters

Also not strictly a “lesson” but an idea that was definitely reinforced. Leveling is not about ranking. Ranking (particularly “stack ranking”) is destructive to teamwork – don’t do it. But when you give people numerical levels, there will be an urge for engineers to compare themselves to one another.

As the leader, you need to emphasize that this whole exercise is about 1) promoting fairness in recognition and 2) promoting individual growth. Be clear that it’s not perfect, and that there will always be mistakes and sometimes under and over-leveling will happen. But that is not the end of the world, because people’s numeric level is not intended (and will not be treated) as of their intrinsic worth or even of their potential.

It’s just a tool to 1) help people understand what kind of behaviors will make them more successful in your particular organization and 2) to help management identify when people who are performing similarly are not being recognized similarly (“similarly”, not “the same”, because no two people’s proficiencies at a given moment in time will be exactly the same).

To help set the right framing, you can make people’s integer levels (and associated titles) public, but don’t make their full rating matrix public. It’s okay for Suzie and Richard to know that they’re both Level 3, even if Suzie thinks she’s a bit better than Richard. It’s unproductive for Richard to obsess over whether it’s fair that he’s really a Level 2 verbal communicator when he knows Suzie is rated as a Level 4 communicator.

Another minor but probably useful technique that was suggested to me is to call this entire system a “growth framework” instead of a “leveling framework”. I disregarded that advice and initially introduced it to the team as a leveling framework, but have since mostly referred to it internally as “our career growth framework”.

I personally prefer the term “level” because I think it’s more clear about what the system literally is. But “growth framework” seems to have less baggage and to generally promote a healthier reception among the team (particularly, it reduces the inclination to think about leveling as equivalent to ranking).

See You In Six Months

I’m sure I missed some learnings, and I’m sure I’ll learn a lot more over the coming months and years as I get to see this system play out over time (and perhaps I’ll have an update to this post to write then). But I think it’s worth sharing these early lessons in hopes that someone reading this can learn from and avoid some of my mistakes.

Obviously I’ve spent a lot of time thinking about this topic, so I’d be happy to chat with anyone else who’s thinking about rolling out a framework. I got quite a bit of advice and input from friends and peers as I was planning this roll out. I’d be happy for the chance to pay that help forward!