About the Client!
The client is a Large Govt. Funded Hospital and Medical Research Institute. The client is currently running as well as collaborating with several RnD Programs around the globe that require access to large amounts of data. They are also running many IOT and Data Science Focused Programs that need to be able to iterate quickly without needing to manage complex infrastructure to support each program.
The client had very specific requirements that we felt would benefit greatly from a cloud deployment. The main focus was around:
1. Archive and Make large amounts of Realtime as well as Archival data more available to Labs/Universities and Research Centers around the world.
2. To support innovation programs exploring use cases in IOT, Artificial Intelligence/Machine Learning and Data Science, allowing them to quickly iterate and deploy POC’s.
The Data Explosion!
Client Data is coming in from many sources:
- Genomic sequencers
- Devices such as MRIs, x-rays and ultrasounds
- Sensors and wearables for patients
- Medical equipment telemetry
- Mobile applications
As well as non clinical sources:
- Human resources
- Supply chain
- Claims and billing
Data from these sources can be structured as well as unstructured. Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
The Main Concern
The hospital is currently hosting 400+ Petabytes of Data! And they generate a large amount of data on a daily basis. All that needs to all be transferred to the cloud for the Solution to be effective. Using the fastest available internet connection it would take them a minimum of 12 Months to migrate all data to the cloud. Plus Data Security and HIPAA Compliance are big considerations.
Pictured above, it is a self contained data storage unit housed in a shipping container with 100 Petabyte Capacity each. 2 of the Snowmobile unit will bring the data Migration time down to a few days.
AWS Snowball Edge
AWS Snowball and AWS Snowmobile are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The Architecture was designed to allow real time data ingestion via Direct Connect and Kinesis, as well as Bulk Data Transfers via Snowmobile and Snowball devices. S3 is used as Primary storage.