Amazon PackApp, Edomiyas Beyene

01

The scale of a broken interface

Amazon's fulfillment network packs tens of millions of packages a day, 11.6 billion shipments in 2022, most of them by hand. The associates doing that work spend their entire shift staring at a single piece of software: PackApp. It tells them what to scan, what box to use, what labels to apply, and when a shipment is complete. Amazon AFT operates across hundreds of fulfillment centers globally, running every shift on internal tooling built over a decade of operational change.

PackApp shipped in 2014 and had not evolved with a decade of change in equipment, materials, and workforce. Associates spoke dozens of languages. They had varying levels of tech literacy. Many had physical disabilities. The app had none of that in mind.

The numbers told the story. Pack Singles variable cost had climbed from $0.26 to $0.38 per unit since 2017. New associates needed 88 hours to reach veteran proficiency in Pack compared to 43 hours in Pick. Recordable injury rates in Pack were the highest of any process path in the network. The app wasn't just outdated. It was causing measurable business damage.

11.6^B

Shipments packed in 2022, most of them by hand

20⁺

Fragmented pack modes across the fulfillment network

88^hr

Hours to proficiency in Pack vs. 43hr for the comparable Pick role

$0.38

Per-unit pack cost in 2022, up from $0.26 since 2017

02

Seeing the real problem

Before designing anything, I needed to understand what was actually broken. I co-led a 12-week discovery phase: visits to 12 fulfillment centers across North America, Europe, and Japan, heuristic evaluations of three core pack process paths, stakeholder interviews across 6 Amazon teams, and usability and customer research sessions with 104 research participants across 6 countries.

I conducted a full UI audit, mapping every screen across 20+ pack modes, cataloging components, identifying redundancies, and building an Associate Map: a complete view of every digital and physical step a packer takes in a single shift.

What I found wasn't a visual problem. It was a cognitive load problem at industrial scale.

Pain 01

"I don't know which mode I'm in until something goes wrong."

Associate, FC-PDX5, Portland OR

Pain 02

"The codes mean nothing to me. I memorized them but I still guess."

Associate, FC-LGB8, Los Angeles CA

Pain 03

"When I'm cross-trained to a new station I feel like it's my first day again."

Associate, FC-MAN1, Manchester UK

Research synthesis and persona artifacts

03

The decision that mattered most

PackApp had accumulated over 20 distinct pack modes over a decade. Each was a slightly different version of the same core workflow, built to accommodate variations in station type, region, fulfillment type, and process path. Every time Amazon added a process requirement, someone built a new mode instead of evolving the existing ones.

Associates switching between modes were consistently the ones making errors. The inconsistency was the defect source, not a side effect. Leadership disagreed. Changing the operational software at this scale felt catastrophic if something broke.

The argument

The modes are not fundamentally different. They are the same job with variation.

Scan item, select packaging, complete shipment. The underlying logic could be unified into a single adaptive UI that surfaced only what was needed for the current task. Usability testing confirmed it: associates encountering unfamiliar modes made significantly more errors in the first 15 minutes, not because the task was harder, but because the UI pattern had shifted. The fragmentation itself was the failure mode.

That data shifted the decision. The evidence carried it, and we moved forward with a single adaptive interface across all pack workflows, with context-specific variations handled in the logic layer, not the UI layer.

The friction

The VP wanted a skin refresh. I pushed for a system rebuild.

The initial brief called for a visual update: new colors, cleaner typography, same underlying structure. I disagreed. Twelve weeks of field research had shown that the visual layer was not the problem. The mode fragmentation was. I brought the Associate Map, the cross-training error data, and three months of usability session recordings to the review. The argument was not "the design is ugly." It was "the architecture is causing measurable harm and the evidence is here." The evidence won the argument, and the system rebuild went ahead. The visual refresh came with it, not instead of it.

04

Five decisions that changed the interface

The design wasn't a single breakthrough. It was five compounding decisions, each addressing a specific failure mode identified in research. Each decision had a legitimate counter-argument. Each required evidence to hold.

01 /

One task per screen

The old interface split into two competing panels simultaneously. Associates had to context-switch constantly. The redesign put one primary action front and center, everything else collapsed unless needed.

02 /

Replace codes with visuals

"1BF" means nothing to a new associate on day one. The redesign replaced alphanumeric box codes with illustrated box types showing actual dimensions and a directional placement indicator: "A1-PM5, top right," with an arrow.

03 /

Visible progress

The old app had no sense of progress within a tote. A persistent counter, "0 of 5 shipments processed," gave packers agency, pace, and a sense of completion within each cycle.

04 /

Completion states that feel human

When a packer finishes a shipment, the old app showed a dark green box with small text. The redesign gave them a moment of recognition: a clear illustration and explicit next instruction. Recognition is a performance accelerant.

05 /

Directional illustrations for complex steps

Instead of abstract text instructions, the redesign used illustrated guides showing exactly what to do and where. This reduced training dependency and improved accessibility across languages and literacy levels globally.

05

The tradeoffs I made

Every meaningful decision had a legitimate counter-argument. I made each tradeoff explicitly, documented it, and owned the consequences.

GAINED

Unified interface across all pack modes

Consolidating 20+ modes into one adaptive UI reduced cognitive switching errors. The tradeoff: a longer engineering migration and higher short-term risk during rollout. I made the case that the long-term error reduction justified the migration cost.

GAINED

Visual box illustrations over alphanumeric codes

Replacing codes like "1BF" with illustrated box types improved day-one proficiency dramatically. The tradeoff: veteran associates who had memorized the codes required a short reorientation period. Testing confirmed the adjustment took less than one shift.

GAINED

Single primary action per screen

Removing the dual-panel layout eliminated the most common error pattern in research. The tradeoff: power users lost the ability to see both panels simultaneously. Field testing showed the error reduction outweighed the speed loss for all but the most experienced associates.

ACCEPTED RISK

Simultaneous global rollout

The decision to ship across all fulfillment centers at once rather than a phased rollout was made by leadership over my recommendation for a staged approach. I documented the risk, built rollback criteria into the handoff, and ensured monitoring was in place before launch.

05

What shipped

Outcomes measured across three sources: Pendo behavioral analytics tracking task completion rates and time-on-task across 48+ associates over 8 weeks post-launch; in-app NPS surveys deployed at 2 and 6 weeks post-rollout; and post-launch interviews with 6 associates across 3 fulfillment centers conducted with the Pack App PM. Baseline established through pre-launch usability testing with 12 associates measured against the same task sequences.

The 30% increase in associate satisfaction came from one decision I made: consolidating 20+ fragmented pack modes into a single adaptive interface. Associates were rebuilding their mental model with every task switch, dozens of times per shift. Removing that friction was the intervention I pushed for, over an initial brief that asked only for a visual refresh. The visual design, the component system, and the interaction patterns all served that one structural call.

30^%

Increase in associate satisfaction, measured via post-rollout in-app NPS

25^%

Reduction in time to proficiency

15^%

Faster task completion across core pack workflows

20⁺

Pack modes unified into a single adaptive interface

06

What I learned

Every decision I made affected millions of people doing physical work in high-stakes environments. That changes how you think about what good design means. It is not about elegance. It is about what happens at the tail end of the distribution: the associate who is 5 feet tall, packing for 8 hours in a second language, on their third station of the day. That person is your user. Design for them and you have designed for everyone.

The other thing I learned: seniority in design is not about taste. It is about knowing when to make the business argument, how to build the evidence for it, and how to hold a position when leadership says the risk is too high. Sometimes the job is to show that the real risk is staying still.

The interface 50 million packages a day depend on

The scale of a broken interface

Seeing the real problem

The decision that mattered most

Five decisions that changed the interface

The tradeoffs I made

What shipped

What I learned