Segment is great for consolidating your business data from various sources and routing them to intended destinations. Segment warehouse is a special type of destination, where raw data is bulk loaded in regular intervals. Setting up a Segment warehouse gives you complete ownership over your data, and from there you can run complex queries, generate reports, and gain deeper insights into your business.
DigitalOcean recently started offering fully managed PostgreSQL data clusters. Current offering starts at $15/month for 10GB clusters, with options to scale up to 1TB. The real exciting bit is that they fully support various PostgreSQL extensions, including PostGIS and TimescaleDB, ideal for analytics & reporting for small/medium sized businesses.
Create Terminal Machine
Before creating the database cluster, it's wise to set up an access terminal first. It's more secure to SSH into the terminal machine, and run psql client from there.
- In DigitalOcean console, go to top right Create > Droplets.
- Choose latest Ubuntu image, 19.04 x64.
- Under Standard plan, scroll left and pick the lowest tier ($5/mo). It's sufficient for SSH and psql client. You can scale up the machine any time if more memory is needed for complex queries.
- It's recommend to upload your private key for SSH.
- Once the terminal machine is up and running, SSH into it and run
sudo apt install postgresql-client-11
Create PostgreSQL Cluster
- In DigitalOcean console, top right Create > Databases.
- Choose PostgreSQL 11, node size and datacenter of your choice.
- Once cluster is initialized, restrict Trusted Sources to the terminal droplet created earlier.
- On cluster overview page, set connection details to Flags and click on show password.
- Copy the psql client string and paste it in your SSH terminal. Verify you can connect to the database and run queries. e.g.
Create Segment Warehouse Destination
- In Segment console, under Catalog > Warehouses, choose Postgres.
- Note down the Segment IP for whitelisting.
- Go back to DigitalOcean console, under database cluster overview, edit the Trusted Sources to include the Segment IP (sans the /32 mask).
- Also under cluster overview, note down the Connection Details
- Return to Segment console, enter the database connection details
- Warehouse sync could take up to 24 hours. Check under warehouse detination Sync History tab to verify a sync was successfully run.
- Segment enforces a strict data schema for sync data. Each source has its own schema name, which could be found under source Settings > SQL Settings
This guide is meant to get you started quickly with a Segment warehouse destination on DigitalOcean. Once up and running, there are a few more things you can do to make things more secure.
- In DigitalOcean console, under cluster Overview, create read-only nodes. Run queries against read-only nodes to prevent accidental data corruption.
- Under cluster Users & Databases, create non-default users.
- In the terminal machine, create unix users for better access control.