Category: AWS

  • AWS Aurora vs. Redshift for Data Warehousing

    AWS Aurora vs. Redshift for Data Warehousing

    At work we are looking into moving from a data dumping ground into a real data warehouse solution. So this took me down a rabbit hole of what should we use to host this ever expanding database? Since we are hosting in AWS two commonly considered AWS services for analytical workloads are Amazon Aurora and Amazon Redshift. While both are powerful, they serve different purposes and are optimized for different types of workloads. So to sort out which way to go, here’s a brief overview of the two solutions that helped me work through this decision:

    Understanding Aurora and Redshift

    Amazon Aurora

    Amazon Aurora is a relational database service (RDS) that provides high performance and availability. It is compatible with both MySQL and PostgreSQL, offering managed features such as automated backups, scaling, and replication.

    Amazon Redshift

    Amazon Redshift is a fully managed data warehouse designed for fast querying and analytical processing over large datasets. It is optimized for Online Analytical Processing (OLAP) workloads and integrates deeply with AWS analytics services like AWS Glue and Amazon Quicksight.

    Key Differences

    FeatureAmazon AuroraAmazon Redshift
    TypeRelational Database (OLTP)Data Warehouse (OLAP)
    WorkloadTransactional & Mixed WorkloadsAnalytical & Reporting
    Data StructureRow-basedColumnar-based
    Query PerformanceOptimized for small queries with high concurrencyOptimized for complex queries over large datasets
    ScalabilityScales read replicas horizontally, limited vertical scalingMassively parallel processing (MPP) for high scalability
    Storage ModelReplicated storage across multiple AZsDistributed columnar storage
    Best ForApplications needing high-performance transactionsBusiness Intelligence, Data Lakes, and Analytics

    Which One Should You Choose for Data Warehousing?

    1. Choose Amazon Aurora if:
      • Your workload requires frequent transactions and OLTP-like operations.
      • You need an operational data store with some analytical capabilities.
      • Your dataset is relatively small, and you require real-time access to data.
    2. Choose Amazon Redshift if:
      • Your primary goal is big data analytics.
      • You need to run complex queries over terabytes or petabytes of data.
      • You require a scalable and cost-effective data warehouse with optimized storage and querying.

    Conclusion

    This is a brief blog post that describes the research I went through. My conclusion is Aurora is best for transactional databases and operational reporting and Redshift is purpose-built for data warehousing and analytics. If you need real-time analytics on live transactional data, you might even consider using both together—storing operational data in Aurora and periodically ETL-ing it into Redshift for deeper analysis.

  • Migrating Kubernetes Containers on AWS from GP2 to GP3

    Migrating Kubernetes Containers on AWS from GP2 to GP3

    At work we have a Stackgres kuberentes cluster that hosts our postgres databases. This allows for high availability, easy data recovery and generally is pretty easy to manage. I admit that when I first started looking at postgres on Kubernetes I was pretty skeptical but it’s honestly given me very little to complain about. It does have some issues due to how the cluster was initially configured that I am planning to fix in the future.

    The K8s cluster was setup with GP2 as the default storage class and so the topic came up a few months ago to migrate to GP3 to increase our IOPs and also reduce cost.

    So I thought that it would be pretty easy to migrate from GP2 to GP3 EBS volumes as I have migrated standard EC2 servers using EBS with a quick CLI script or GUI click. I sent in a ticket to Ongres, the company behind Stackgres to see if they had any guidance on the process. I was expecting again a simple one liner kubectl command or script.

    Instead I received a long procedure and thought I’d document it here…

    1. Make the cluster leader pod “0”. I’m not 100% sure this is needed but it was in my directions. I didn’t test this but I figure it will delete whatever pod is not the leader. But again I didn’t test.
      1. kubectl exec -it -n <<namespace>> <<stackgres_pod_name>> -c patroni -- patronictl list
      2. If needed switchover: kubectl exec -it -n <<namespace>> <<stackgres_pod_name>> -c patroni -- patronictl switchover
    2. Take a backup… take a backup… take a backup! Don’t start this process without a recent backup as you are going to delete volumes.
    3. Set the cluster size to 1, destroying the replica: kubectl edit sgclusters.stackgres.io -n <<namespace>> <<stackgres_cluster_name>>
    4. Use kubectl get pvc to find the volume claim and release it by deleting it.
    5. User kubectl get pv to find the volume and then delete the volume.
    6. Set the cluster size to 2, creating a new replica: kubectl edit sgclusters.stackgres.io -n <<namespace>> <<stackgres_cluster_name>>
    7. watch for the replica to be rebuilt and sync up with the leader: kubectl exec -it -n <<namespace>> <<stackgres_pod_name>> -c patroni -- patronictl list
    8. Once the sync is complete, switchover to the replica and then follow the steps to delete the old leader.

    I admit that I made a mistake at one point and deleted a PVC that was still in use. Thankfully the Ongres team was able to help me recover from that. I’ll document that in a later post.

  • AWS Solution Architect Professional

    AWS Solution Architect Professional

    I had let my AWS Solution Architect Professional certification expire as I didn’t have a lot of spare time during my previous role. So I figured now with my surplus of time I would work on renewing it.

    A Cloud Guru

    For all my AWS certifications so far I had used A Cloud Guru and it worked alright for me so I decided to use their service again this time around. Pluralsight bought them / merged with them sometime in the past few years so they are working on combining the two services. My time training was caught in the middle of this merging of my course which is understandable but also unfortunate as it would be confusing when I would log in and see that they had added new videos or modified quizzes and tests.

    The video content was pretty good. If you have any experience or have taken the associates level test then some of the content will be familiar to you but don’t skip too much of the videos. I would find little nuggets of information that were helpful on the quizzes. The challenges are good brain problems, trying to figure out in your head how you’d respond to a scenario. The demos and labs were ok, some of them I felt were too easy or not detailed enough to really provide help in my training but your milage may vary.

    My Tips

    • A Cloud Guru / Pluralsight offer a playground, use it. Play with all the things you are learning. There are only a few exceptions that you aren’t able to create in the playground, like multi account setups that centralize permissions and logging.
    • Have your own account to play in, there’s nothing like actually building and supporting your own blog or whatever. (I run a mailsesrver).
    • Give yourself lots of time, don’t set your test date too early. But also don’t procrastinate.

    Good Luck!