Final Solution Architecture

Ergomotion IOR
AWS Migration
Project Plan

Transition from a monolithic web app with static CSV files to a fully automated SAP → S3 → Glue → Lambda → DynamoDB pipeline with Microsoft 365 SSO.

14
Weeks
6
Phases
5–7
Team Members
10K
Records / Batch
Scroll to explore
01 — Current State

What You Have Today

A FastAPI + React monolith that loads static CSV files at startup, processes uploaded SAP exports in-memory with Pandas, and outputs color-coded 192-column Excel files for customs brokers. Auth is JWT + bcrypt with a JSON file user store.

Frontend — React 18 + Vite
TypeScript, plain CSS, Lucide icons
  • Drag-and-drop CSV/Excel upload
  • Validation dashboard (input, customs lines, addition tab)
  • User management admin panel
  • Auth context with cookie-based JWT (currently bypassed)
  • No router — conditional views via state flags
Backend — FastAPI + Pandas
Python 3.11, Uvicorn, openpyxl
  • 4 CSV master files loaded at startup (LRU cached)
  • 3-component decomposition (NonSteel/Aluminum, Steel, Aluminum)
  • HTS code lookup, tariff calculation, duty computation
  • 192-column Excel output with dual tabs + color coding
  • Audit trail with CALC-ID, SHA-256 hash, geolocation
🔐
Auth — Custom JWT
bcrypt, users.json, admin/operator roles
  • JWT (HS256) with bcrypt password hashing
  • User store: flat JSON file on disk
  • Role-based: admin and operator
  • Password change, reset, forced change flows
  • Testing mode bypasses real auth
📦
Data — Flat CSV Files
~75 products, on-disk, manually updated
  • Product_List.csv — materials, weights, COO, HTS codes
  • HTS Tariff.csv — category/COO tariff mappings
  • HTS_Code.csv — rate lookup with validity dates
  • Packaging_Material.csv — packaging SAP numbers
  • Audit stored locally or in S3 (Object Lock)
🚀
Deployment
Railway, Render, Lightsail configs
  • render.yaml, railway.json, deploy_lightsail.sh
  • No Docker — uses platform buildpacks
  • No CI/CD pipeline
  • Environment vars for CORS, JWT secret, S3 keys
📝
Audit Trail
CALC-ID, 7-year retention, S3 Object Lock
  • Unique ID: CALC-YYYYMMDD-HHMMSS-XXXX
  • File SHA-256 hash, row counts, tariff rates used
  • Server IP geolocation, reconciliation check
  • S3 Object Lock (Compliance mode, 2555 days)
  • Indexed by PO number and filename

02 — Gap Analysis

Current vs. Final State

Every dimension that needs to change to reach the target architecture. Green cells show what the final state delivers over the current red state.

Dimension Current State Final State
Data Source Gap Static CSVs on disk, manually updated Target SAP China sends JSON daily to S3
Data Storage Gap Flat files on app server Target S3 (raw + processed Parquet) + DynamoDB lookups
Data Pipeline Gap None — loaded at app startup Target AWS Glue ETL, daily scheduled PySpark transforms
Fuzzy Matching Gap None — exact material match only Target PySpark fuzzy matching for customer name discrepancies
Record Volume Gap ~75 products Target 6,000–10,000 records per daily batch
Authentication Gap Custom JWT + bcrypt + users.json Target Microsoft Entra ID SSO (M365 app launcher)
Hosting Gap Railway / Render / Lightsail Target AWS (Lambda, S3, DynamoDB, Glue, CloudFront)
Cold Storage Gap None Target S3 Glacier lifecycle (Instant → Flexible → Deep Archive)
Monitoring Gap Console logs only Target CloudWatch dashboards, SNS alerts, WAF

What Carries Forward (No Rebuild Needed)

processor.py
3-component decomposition, steel/aluminum calcs, 192-column output. Moves to Lambda nearly unchanged.
validator.py
Input/output validation, cross-reference checks. Direct port to Lambda.
audit_trail.py
CALC-ID generation, metadata structure. Minor update for Entra ID user fields.
s3_audit_storage.py
Already built for S3 with Object Lock. Reuse directly in Lambda.
Frontend UI
Upload flow, validation tables, admin panel, styling. Only auth layer and API URLs change.
Excel Generation
openpyxl color-coded output with dual tabs (production + audit). Runs in Lambda the same way.

03 — Final Architecture

AWS Target Architecture

Five layers from SAP ingestion through to the user-facing application and 7-year audit archival.

External — SAP China
Daily JSON Push (6K–10K records)
Product Data JSON HTS Codes JSON Tariff Rates JSON
Layer 1 — Ingestion
S3 Raw Landing Zone + Lambda Validator
S3 Raw Bucket Lambda Validator SNS Alerts Schema Check Dedup Detection
Layer 2 — ETL & Transformation
AWS Glue (PySpark) — Daily at 02:00 UTC
Clean & Normalize Fuzzy Match (Levenshtein) Quality Checks Cross-Reference Parquet Export
Layer 3 — Processed Data
S3 Parquet + DynamoDB Lookup Tables
S3 Parquet (partitioned) DDB: Product Master DDB: HTS Codes DDB: Tariff Categories DDB: Customer Master DDB: PO Reference
Layer 4 — Application
React (S3 + CloudFront) → API Gateway → Lambda
CloudFront CDN Entra ID SSO (MSAL.js) API Gateway Lambda: Process Lambda: Download Lambda: Audit
Layer 5 — Audit & Archival
S3 Object Lock + Glacier Lifecycle (7 Years)
0–90 days: S3 Standard 90–365 days: Glacier Instant 1–3 yrs: Glacier Flexible 3–7 yrs: Deep Archive CloudWatch + WAF + KMS

04 — Timeline

14-Week Implementation Gantt

Phases overlap where dependencies allow. The critical path runs SAP schema → Glue ETL → DynamoDB → Lambda → API Gateway → Frontend.

Phase W1W2W3W4W5W6W7 W8W9W10W11W12W13W14
Phase 0AWS Foundation
Infrastructure
Phase 1Data Pipeline
SAP → S3 → Glue → DynamoDB
Phase 2App Migration
FastAPI → Lambda
Phase 3Entra ID SSO
Microsoft 365 Auth
Phase 4Frontend Deploy
S3 + CloudFront
Phase 5Hardening
Go-Live

Critical Path Dependencies

Wk 3 SAP JSON schema must be agreed before Glue ETL can be built
Wk 7 DynamoDB tables must be populated before Lambda functions can be tested
Wk 8 Entra ID app registration must be done by IT admin before MSAL work starts
Wk 10 API Gateway must be live before frontend can switch from direct FastAPI calls

05 — Phase Details

Task Breakdown by Phase

Click each phase to expand the full task list with deliverables.

P0
Foundation & Infrastructure
Weeks 1–3 · DevOps Engineer · 7 tasks
IDTaskDeliverable
0.1Set up AWS Organization, accounts (dev/staging/prod)AWS account structure
0.2Terraform / CloudFormation IaC repositoryInfrastructure as Code repo
0.3Create S3 buckets (raw, processed, audit, frontend) with policiesBuckets with versioning, encryption
0.4Create DynamoDB tables with GSIs5 tables: product, HTS, tariff, customer, PO
0.5Set up IAM roles and policies (Glue, Lambda, S3)Least-privilege IAM
0.6Set up CloudWatch log groups, SNS topicsMonitoring baseline
0.7Set up CI/CD pipeline (GitHub Actions or CodePipeline)Automated deploy pipeline
P1
Data Pipeline — SAP to S3 to DynamoDB
Weeks 3–7 · Data Engineer + SAP Team · 10 tasks
IDTaskDeliverable
1.1Define JSON schema contracts with SAP team (products, HTS, tariffs)Schema documentation
1.2SAP team builds daily JSON export to S3 raw landing zoneSAP integration endpoint
1.3Build Lambda ingestion validator (schema check, dedup, trigger Glue)Lambda function
1.4Build AWS Glue ETL job (PySpark): clean, normalize, validate 6K–10K recordsGlue job
1.5Add fuzzy matching for customer name discrepancies (python-Levenshtein)Fuzzy match module in Glue
1.6Add quality checks (missing HTS, zero weights, invalid COO, cross-ref)Quality report output
1.7Write Parquet output to S3 processed zone (partitioned by date)Parquet files in S3
1.8Populate DynamoDB lookup tables from processed ParquetDynamoDB populated
1.9Build data quality dashboard / reportingQuality monitoring
1.10Backfill: migrate current CSV master data into DynamoDB as baselineInitial data load verified
P2
Application Migration — FastAPI to Lambda
Weeks 6–10 · Backend Engineer · 8 tasks
IDTaskDeliverable
2.1Refactor data_loader.py: replace CSV reads with DynamoDB queriesDynamoDB data layer
2.2Refactor processor.py: keep logic, swap data source to DynamoDBProcessor using DynamoDB
2.3Package processing code as Lambda function (API Gateway trigger)Lambda: process
2.4Package Excel download as Lambda (presigned S3 URL for output)Lambda: download
2.5Package audit storage as Lambda layer (reuse existing S3 code)Lambda: audit
2.6Set up API Gateway with routes matching current API surfaceAPI Gateway configured
2.7Implement S3 lifecycle policies for audit (Standard → Glacier tiers)Lifecycle rules active
2.8Integration testing: end-to-end with DynamoDB dataTest suite passing
P3
Authentication — Microsoft Entra ID SSO
Weeks 8–11 · Frontend + Backend Engineer · 8 tasks
IDTaskDeliverable
3.1Register app in Microsoft Entra ID (Client ID, Tenant ID, redirect URIs)App registration
3.2Configure OAuth 2.0 scopes, app roles (admin, operator)RBAC configuration
3.3Frontend: Replace AuthContext with MSAL.js (@azure/msal-react)SSO login flow
3.4Lambda: Add JWT validation middleware (Microsoft-issued tokens)Token validation
3.5Map Entra ID roles to existing admin/operator rolesRole mapping tested
3.6Update audit trail to capture Entra ID user identityAudit user fields updated
3.7Configure M365 app launcher tileApp in waffle menu
3.8Remove old auth/ module (JWT + bcrypt + users.json)Dead code removed
P4
Frontend Deployment & Polish
Weeks 10–12 · Frontend Engineer · 5 tasks
IDTaskDeliverable
4.1Deploy React build to S3 + CloudFront (HTTPS, custom domain)Frontend hosted on AWS
4.2Update API calls to point to API Gateway endpointAPI integration verified
4.3Add data quality alerts in UI (ETL quality report flags)Quality indicators
4.4Add data freshness indicator ("Last SAP sync: 2h ago")Freshness badge in header
4.5E2E testing with real SAP data through full pipelineUAT complete
P5
Hardening & Go-Live
Weeks 12–14 · All Team · 6 tasks
IDTaskDeliverable
5.1Load testing (6K–10K records through full pipeline)Performance baseline documented
5.2Security review (IAM, WAF, encryption, Entra ID audit)Security sign-off
5.3Disaster recovery testing (S3 cross-region replication)DR plan validated
5.4Runbook documentation (operations, troubleshooting, escalation)Ops documentation
5.5Staged rollout (pilot users → full team)Production go-live
5.6Decommission old Railway/Render/Lightsail deploymentOld infra shut down

06 — Team Composition

People & Skills Required

5–7 people across 5 roles. Some roles can overlap depending on team size and budget.

☁️
AWS Cloud / DevOps Engineer
1 person · Full-time · Phases 0–5 (14 weeks)
  • AWS accounts, VPC, IAM, S3, DynamoDB, API Gateway, CloudFront
  • Terraform / CloudFormation infrastructure as code
  • CI/CD pipelines (GitHub Actions or CodePipeline)
  • CloudWatch monitoring, SNS alerts, WAF rules
  • S3 lifecycle policies (Standard → Glacier tiers)
  • KMS encryption, Object Lock for audit compliance
AWS Certified Terraform S3 DynamoDB Lambda API Gateway CloudFront IAM CloudWatch CI/CD Docker
🔧
Data / ETL Engineer
1–2 people · Full-time → Part-time · Phases 1–2
  • Define JSON schema contracts with SAP China team
  • Build AWS Glue ETL pipeline (PySpark) for 6K–10K records
  • Implement fuzzy matching (python-Levenshtein on Spark)
  • Build data quality checks and cross-reference validation
  • Write Parquet to S3 processed zone, populate DynamoDB
  • Backfill migration of current CSV data into DynamoDB
Python PySpark Pandas AWS Glue S3 DynamoDB Parquet Fuzzy Matching JSON Schema SQL / Athena
⚙️
Backend / Application Engineer
1–2 people · Full-time · Phases 2–4 (8 weeks)
  • Refactor data_loader.py to read from DynamoDB instead of CSV
  • Package processing pipeline into AWS Lambda functions
  • Set up API Gateway routes matching current API surface
  • Implement Entra ID JWT token validation in Lambda
  • Handle Excel generation in Lambda (presigned S3 URLs)
  • Write integration tests (pytest + moto for AWS mocks)
Python FastAPI Lambda boto3 API Gateway DynamoDB Entra ID JWT Pandas openpyxl pytest
🎨
Frontend Engineer
1 person · Full-time → Part-time · Phases 3–5 (6 weeks)
  • Replace custom AuthContext with MSAL.js (@azure/msal-react)
  • Implement Entra ID SSO login/logout flows
  • Update API calls to point to API Gateway endpoint
  • Add data quality indicators and freshness badge
  • Deploy frontend to S3 + CloudFront (custom domain, HTTPS)
  • Handle token refresh and error states
React 18 TypeScript MSAL.js OAuth 2.0 / OIDC Vite S3 + CloudFront CSS REST APIs
📋
Project Manager / Scrum Master
1 person · Part-time (50%) · Phases 0–5 (14 weeks)
  • Coordinate between team, SAP China, and M365 IT admin
  • Manage phased rollout timeline and dependency tracking
  • Facilitate UAT with customs operations team
  • Manage risk: SAP delays, schema changes, Entra ID access
  • Coordinate security review and compliance sign-off
Agile / Scrum Jira / Azure DevOps AWS Migration Stakeholder Mgmt Trade/Customs Domain

07 — Infrastructure Cost

AWS Running Cost Estimate

Estimated monthly AWS service costs after go-live, based on expected usage patterns.

5–7
Team Members
14 wk
Duration
6
Phases
$55–155
Est. AWS / Month

Post Go-Live Service Breakdown

ServiceUsageEst. Monthly Cost
Lambda~10K invocations/day, 512MB, 10s avg$15–50
API GatewayREST API, ~300K requests/month$3–10
DynamoDBOn-demand, 5 tables, ~50K reads/day$10–30
S3 (all buckets)~50GB raw + processed + audit$5–15
S3 GlacierGrowing archive over 7 years$1–5
Glue1 daily job, 2 DPU, ~5 min$15–25
CloudFrontFrontend CDN, low traffic$1–5
CloudWatchLogs, metrics, alarms$5–15
TOTAL $55–155/mo

Enter Admin Password

This lets you edit content on this page. Changes are saved for everyone.

Incorrect password. Try again.
Editing Mode — click any highlighted text to edit