Tag: Hosting

  • AWS Aurora vs. Redshift for Data Warehousing

    AWS Aurora vs. Redshift for Data Warehousing

    At work we are looking into moving from a data dumping ground into a real data warehouse solution. So this took me down a rabbit hole of what should we use to host this ever expanding database? Since we are hosting in AWS two commonly considered AWS services for analytical workloads are Amazon Aurora and Amazon Redshift. While both are powerful, they serve different purposes and are optimized for different types of workloads. So to sort out which way to go, here’s a brief overview of the two solutions that helped me work through this decision:

    Understanding Aurora and Redshift

    Amazon Aurora

    Amazon Aurora is a relational database service (RDS) that provides high performance and availability. It is compatible with both MySQL and PostgreSQL, offering managed features such as automated backups, scaling, and replication.

    Amazon Redshift

    Amazon Redshift is a fully managed data warehouse designed for fast querying and analytical processing over large datasets. It is optimized for Online Analytical Processing (OLAP) workloads and integrates deeply with AWS analytics services like AWS Glue and Amazon Quicksight.

    Key Differences

    FeatureAmazon AuroraAmazon Redshift
    TypeRelational Database (OLTP)Data Warehouse (OLAP)
    WorkloadTransactional & Mixed WorkloadsAnalytical & Reporting
    Data StructureRow-basedColumnar-based
    Query PerformanceOptimized for small queries with high concurrencyOptimized for complex queries over large datasets
    ScalabilityScales read replicas horizontally, limited vertical scalingMassively parallel processing (MPP) for high scalability
    Storage ModelReplicated storage across multiple AZsDistributed columnar storage
    Best ForApplications needing high-performance transactionsBusiness Intelligence, Data Lakes, and Analytics

    Which One Should You Choose for Data Warehousing?

    1. Choose Amazon Aurora if:
      • Your workload requires frequent transactions and OLTP-like operations.
      • You need an operational data store with some analytical capabilities.
      • Your dataset is relatively small, and you require real-time access to data.
    2. Choose Amazon Redshift if:
      • Your primary goal is big data analytics.
      • You need to run complex queries over terabytes or petabytes of data.
      • You require a scalable and cost-effective data warehouse with optimized storage and querying.

    Conclusion

    This is a brief blog post that describes the research I went through. My conclusion is Aurora is best for transactional databases and operational reporting and Redshift is purpose-built for data warehousing and analytics. If you need real-time analytics on live transactional data, you might even consider using both together—storing operational data in Aurora and periodically ETL-ing it into Redshift for deeper analysis.

  • Replacing Google Workspaces

    Replacing Google Workspaces

    So with the announcement that my freeloading for email hosting on Google is coming to an end I decided to go down the road of setting up my own email server as I figured since I was going to have to pay for email going forward why not just host it. Is this a good idea? I’m not sure, it seems like there are lots of blog posts that tell you not to host your own email but in my situation I have several custom domains that i was feeding into google with around 20 family freeloaders users and cost became an issue.

    Most of my cloud expertise is in AWS which made it a pretty easy decision to use their services to host my server. Since mailcow is now dockerized this was fairly easy to create an ASG that has one EC2 server. I hosted the data on EFS which allows me to kill off the EC2 instance and it will rebuild itself within a few minutes. And to top it off I am using SES for outbound emails as this allows me to avoid getting my sent emails trapped in spam filters.

    So far there have been very few gotchas, the issues that have come up were my own doing for over thinking the problem. Although once the family starts using the server I’m sure that I’ll need to iterate. My only real fear is loosing data and I’ve been able to test that a few times by completely tearing down the stack and rebuilding and reloading from a backup.

    Next step is I’ll be working on importing the families emails into the server and mailcow uses imapsync under the hood and actually getting some real traffic beyond myself and my emails.