root@localhost:~#

Blog posts


Why Snaps are Bad

03/18/2024

I was performing a migration on a self-hosted instance of gitlab when I ran into a strange "bug" with Docker resulting in containers being inaccessible. I had set up a Docker container with Gitlab a couple years ago to store my side projects so I could have redundancy and be able to work on them from my laptop, deskto p, etc. Something drew my attention to my version needing an upgrade, so I attempted to swap in the latest Gitlab image in my docker-compose.yml and carry on. It did not work, and informed me that I was upgrading from x to y and should read some documentation on how to perform the migration, which was nice. It's a little more involved than I was hoping: you must upgrade from version 13.9.x -> 14.0.12, 14.0.12 -> 14.3.6... until the target version (16.x.x). Okay, I can just run those migrations and do other stuff asynchronously and it won't be a big deal. I tried accessing the server after the first migration and no response. Huh? Ping. Fine. Ssh still working... Maybe try another container, perhaps a vanilla nginx one? That didn't work either. Curl localhost? Works just fine. After sifting through many Google results for "container not accessible outside host", I found this little gem:

  • https://askubuntu.com/questions/1423293/ubuntu-22-04-docker-containers-not-accessible-from-outside

Which points out a known bug:

  • https://bugs.launchpad.net/ubuntu/+source/ufw/+bug/1968608

Older ubuntu servers (20.04) use iptables-legacy. there are two backends, xtables and netfilter, and as long as you're consistent using one or the other exclusively, things should be fine. but if you don't, undefined firewall behavior may occur. Ubuntu even creates symlinks to help ensure this. However, snap packages have been known to ship with their own iptables or nftables.

Did I use snap?

    sudo snap services
    Service                          Startup  Current   Notes
    docker.dockerd                   enabled  active    -

Was it updated?

name:      docker
  latest/stable:    24.0.5   2024-02-01 (2915) 136MB 

Lo and behold:

  • https://github.com/docker-snap/docker-snap/issues/68

Looking at the output of iptables-save/iptables-legacy-save I can see the new docker routing rules are not there, but all my old ones are (can tell by the different ports certain apps used). And iptables-nft-save shows the new routing rules. Was the snap docker distribution using netfilter a recent major change?

  • https://github.com/docker-snap/docker-snap/releases

Looks like it.

Therefore, it seems that during an unattended upgrade, a version of Docker that uses netfilter instead of the traditional xtables preferred by the system was installed resulting in undefined firewall behavior (the FORWARD DROP rule that appeared in iptables-legacy but not iptables), making my containers unreachable from outside the host.

That was fun.

tl;dr - The snap distribution of Docker ignored my system's preferences, resulting in undefined firewall behavior.


Writing a static site with Next.js

03/13/2024

I just rewrote my site with Next.js, coming from Python's Lektor. It only took a few hours, looks much better, and will be easier to customize in the future. Here's some packages I used to help make this go smoothly:

Mantine

A new UI library that people are talking about, I used this on the last app I was working on and it was nice to work with.

Unified & Remark

My old blog posts were stored as "content.lr" files with all the data (including content/body) in the frontmatter. So all I had to do was write some logic in getStaticProps to loop through those files and parse the data into json. This seems to be a pretty standard way of doing things:

import rehypeDocument from 'rehype-document'
import rehypeFormat from 'rehype-format'
import rehypeStringify from 'rehype-stringify'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import {unified} from 'unified'
import {reporter} from 'vfile-reporter'

const file = await unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(rehypeDocument, {title: 'šŸ‘‹šŸŒ'})
  .use(rehypeFormat)
  .use(rehypeStringify)
  .process('# Hello world!')

  • https://www.npmjs.com/package/unified

The only thing that was left after that was to sort the posts, then write a map loop for creating an posts index at the top, and one for rendering the content into blog posts using dangerouslySetInnerHTML (love that name).

Comparison

The downside to this approach is that there is no GUI / CMS-like features where you can create new posts and edit them. But I have markdown support installed in VSCode, and frankly I'm enjoying writing in here more than Lektor's confusing web interface (adding posts by running a development server, navigating to the page, clicking "creat subpage"... you get the picture). Lektor also let you upload image files, so that's another thing I'll have to write. Perhaps a bigger drawback for this implementation is that I have to write pagination myself, whereas that came built in with Lektor. Even so, when I was trying to update my Lektor version I was getting a strange error in the pagination, and even the quickstart example had build errors (not sure if it was a theme issue). So here I am.

Deployment is the same: run build and copy the output to /var/www/html. It's so refreshingly easy. Lektor did have some feature where you could put the server IP and it would run rsync as a deploy command or something, but I switched to Cloudflare pages recently and use wrangler (see /blog#clis-the-cloud-and-design).

Thinking about writing your own blog with a Javascript framework? I recommend it!


CLIs, the cloud, and design

10/24/2023

At my last job I wrote the CLI for deploying AI/ML models. I had never written a CLI before. It's not particularly difficult to do, but they are deceivingly difficult to get "just right". Since my time at that company I've been writing and deploying a lot of sites. Part of my goal of these projects is to be as low complexity/resource demanding as necessary, since I'll be footing the server bill each month.

So I started demoing different providers and their offerings. GCP was nice but not ideal for my use case. I had already used AWS before. Then I stumbled upon Cloudflare - they offer free static site hosting that is entirely easier to set up compared to GCP. So I migrated my blog again and came up with a few static sites. You can deploy projects via GUI, or from the command line via wrangler, their CLI which is now a part of their "workers-sdk" repository^1. It's written in Javascript, easy to install, and is strangely similar to the CLI I wrote. I was surprised to see it written in Javascript, with Python or Go being my first two choices. There are some bigger differences though. In hindsight I can see why Cloudflare's approach is better.

Auth and prompting the user

The first thing that stuck out was: if the user forgot to pass their token, launch an OAuth workflow via browser. All I had to do was log in to Cloudflare in my browser, with my username and password saved in the browser. This lets the user's browser handle the username and password concern, which is desirable.

$ npx wrangler pages deploy html/ --project-name myproject --skip-caching
Attempting to login via OAuth...
Opening a link in your default browser: https://dash.cloudflare.com/oauth2/auth?response_type=code&client_id=[redacted]&redirect_uri=http%3A%2F%2Flocalhost%3A8976%2Foauth%2Fcallback&scope=account%3Aread%20user%3Aread%20workers%3Awrite%20workers_kv%3Awrite%20workers_routes%3Awrite%20workers_scripts%3Awrite%20workers_tail%3Aread%20d1%3Awrite%20pages%3Awrite%20zone%3Aread%20ssl_certs%3Awrite%20constellation%3Awrite%20offline_access&state=lcykA2LiHgli6Az1DFDjbquPHBXWRlnr&code_challenge=_9Jkt0gMzkwSfM-vy9xrSC-yy9y7fa0OYCf5zMYS1Pg&code_challenge_method=S256
Successfully logged in.
šŸŒ  Uploading... (2/2)

āœØ Success! Uploaded 2 files (0.74 sec)

āœØ Deployment complete! Take a peek over at [redacted]

Seeing is knowing

Command line semantic consistency and visualizing the taxonomy^2. I remember implementing a command that was not intuitive and having to refactor it. This could have been avoided by drawing out the planned command(s).

Looking for inspiration (and choosing the right one)

Whenever I faced a problem that I didn't have much experience in, or had no intuition for, I would begin searching GitHub for code that solved a similar problem. I did consult Heroku's CLI, but I passed over it. Cloudflare took inspiration in the form of "colon-delimited command namespacing". I like how it makes painfully clear what the command is operating on. User proofing your interfaces is difficult.

Avoid logic that requires passing in raw ID strings

The third takeaway in the Cloudflare blog post I am guilty of: my delete command required the GUID of the deployment to delete. Being able to issue a command based on a name is better user experience than having to paste a long string into the command line. I myself am so used to doing this that it simply doesn't bother me anymore, and so I never thought of it. It's dangerous to assume that your experience will be the same as others!

Conclusion

Designing a CLI just right is more difficult than you would think: it needs to be intuitive for all users with a variety of computing backgrounds. It needs collaboration to do well. While I didn't write the perfect CLI, I did add some interesting features that most CLIs don't have:

  • Typo resilience - a mistyped command will still work (deply vs deploy) by comparing string similarity to the list of available commands.
  • Aliased commands. Sometimes users can't remember the command and type something similar but different (ls instead of list). A few of the commands were aliased like that.
  • Customized command groupings. Most CLIs when invoked with --help give you a large list of alphabetically sorted commands/options. This is notoriously difficult to read, so we hooked into the CLI framework and modified how commands are printed.

More Time

05/31/2023

At the end of March I received some unfortunate news that many have been receiving lately - me and several others got laid off. This kind of took me by surprise since I was always under the impression that things were OK, that our company was ahead of the curve on this (they let go some people over the summer as well). But I understand it's hard to avoid pressures that not even FAANG companies are insulated from. The silver lining in this? I have 45+ more hours per week to do whatever I please with. Time that cannot be bought anywhere else.

After nearly 7 years of working I realized that I might not get this same opportunity again. So, earlier this month I began writing some apps again. I conceived of, implemented, and deployed 3 apps so far. One per week. They range in complexity from being a simple static site with some vanilla JS/CSS to having a frontend + backend with a sqlite database. It seems like an efficient pace, and assuming I have no idea what idea is good/will succeed, I won't lose too much time on any one project. I have some access to A.I. now which I'm experimenting with and seems to be adding some value to my development process. I'm finding that chatGPT is actually a nice programming assistant that definitely cannot do everything I can do. But it can answer my questions pretty well and has only hallucinated badly once (imaginary imports and functions).

What will the next generation of apps based on A.I. look like? I think it's already clear that information retrieval is a huge success - I'm testing the limits of this and how much of an "expert" a language model can be. Initial results are impressive but flawed. Hallucination was again a big problem. Each version is improving so much that I'm confident the issue won't be nearly as much of a problem in the future. Mechanisms/workarounds for this already exist - simply tell the agent it is wrong and it attempts to correct its answer. But for a model to be used in production in a broad range of industries, hallucinations are unacceptable. Imagine if it started giving you fake news and facts, dubious medical advice, buggy code... That could give rise to a whole new class of lawsuits.


Advent of Code - not another Leetcode clone

01/31/2022

Lately I've been doing Advent of Code challenges. Somebody suggested it to me recently and I thought why not, many people use this as a measure of proficiency. It's entirely possible that I've been using frameworks / other people's code so much that I got a little rusty at solving problems with just the standard library. I've done some Leetcode and HackerRank before so I expecting something exactly like those platforms. I was wrong. Those other sites are platforms with a network and the most common interview questions. Advent of Code is different.

Advent of Code is not nearly as sophisticated - you code locally on your machine and submit the answer in a text form. What it lacks for in front end tech it makes up for in originality: each problem is posed as some fictional event occurring on a submarine that is delving into the unknown depths to retrieve the keys to Santa's sleigh. Or something. The problems are released annually each day in December. There is a leaderboard - top scorers are determined by how quickly they can submit the correct answer. There are a number of unique inputs / outputs to minimize cheating. Each problem has two parts - the first being relatively easier and the second usually adds another layer of complexity or demands a scalable solution. It is noted that all problems can be computed in 15 seconds or less on old hardware. If you don't have a good solution it could take your code hours or days to finish!

I think my two favorite things about these challenges are the novelty of the problems and the simplicity of the platform. I don't need to write code in the browser and I prefer running code locally so I can debug if needed. I'm posting my solutions on GitHub. I definitely will look forward to next year's problems and may even experiment with using different languages and benchmarking my answers.


Hacktoberfest 2021

11/29/2021

This year I did something I had been meaning to do for a while: contribute to open source projects. I always loved the idea of contributing back to the open source code I use so much, but spending time on my own projects took priority. One thing stopping me from contributing to open source sooner was not knowing exactly how to get started. It's a common issue for many people because each project has its own contributing guidelines and level of openness to contributions.

However, with the little push from Hacktoberfest I was able to jump in and make some quality contributions in just a few weeks to complete the challenge of 4 accepted pull requests. I was even able to find a bug in one of Microsoft's projects! While my number of lines of code was relatively small, most time was spent reviewing open issues, understanding the code base, and trying to gauge whether I could solve the issue quickly or not. Some projects have labeled open issues with "good first issue" which are for newcomers to get started with. Having a set of projects participating in Hacktoberfest with this guidance allowed me to contribute successfully.

And I learned a lot in the process! Every project is different when it comes to the process for committing code. From conforming to style to running and passing tests there are a number of checks in place to ensure the quality of the project stays high. Though I knew git from private repositories, I definitely became more familiar with git commands and doing more nuanced things like resetting, rebasing, squashing, etc. I would like to keep contributing to projects that I use frequently, possibly even adding features of my own. It is time consuming to filter through issues and find a good one to create a pull request for. I may even open source some of my own code if I can refactor it and clean it up enough to be used by other people.


Making Sense of Nonsense

07/25/2021

For my next project I decided I needed to try something that could possibly directly support my hobby financially. So I started creating a very high-level trading algorithm. Instead of focusing on standard mathematical metrics (moving averages, Bollinger bands, RSI, etc) which has been done to death and optimized at higher frequencies than I'll ever be able to achieve on my own, I decided to look at something a little more niche like /r/wallstreetbets sentiment. Don't worry, it's not another mention frequency based meme stock buying bot. I try to decide stocks to trade based on positive and negative sentiments present in the body of text. The idea is there is volatility in these stocks and that it may be possible to day trade these tickers based on what people are about to do (assuming my parsing logic extracts the signal the same way humans are interpreting posts). In short, sentiment analysis of user comments is already a challenging task. Having a computer program that can work its way through double negatives, sarcasm, jokes, and idioms - in a community that does these things to the fullest extent - is a challenging task to say the least. One example is a comment that goes like this

"Buying put options on $GME is free money"

The sentiment of the comment is positive, however the trading action of buying a put is negative (it means betting the stock will go down in price).

Extracting an accurate signal from this type of data is very tricky and time consuming, but I was able to come up with a quick solution to account for these comments. I monkey patched a sentiment analysis algorithm. I have used gevent monkey patching before in production where I was facing certain constraints with async code. This time I wanted to learn something new and decided to do my own monkey patching. Now my algorithm knows the meaning of words in the context of trading.

Another interesting problem that came up is the potential for fake accounts / bots to spread misinformation. Institutional investors have been keen on sentiment for trading for years now (the Bloomberg terminal has a section for Twitter sentiment) and similarly equipped groups could manipulate the impulsivity of /r/wallstreetbets to their advantage.

My next steps are to pull in sentiments about a company from a variety of other sources - twitter has a simple API as well. There are pretty sophisticated financial APIs out there that I am researching. The most challenging thing to do won't be simply answering whether "this news is good news" or "this financial data is bad", but what to do with that particular stock since apparently stocks get dumped on good news.

There are a number of algorithmic trading libraries available in the Python ecosystem. One I want to use is backtrader. The simple interface that "gets out of your way" is what I am looking for. It allows you to define custom data sources, do backtesting, and trade live.


Learning Golang

06/16/2021

I decided it was a good time to learn another programming language. So I chose Go. Why did I choose Go? I don't know, somebody gave a presentation about it once and it seemed kind of interesting. I see other people use Go. It is used in many network and devops projects (ones that I use, like Docker). Why not see what it's about?

I created some simple web apps (see: hget.org) to learn it. I used the Gin web framework ^0 because even though the standard http and net libraries are great, I did not want to spend time writing this code.

Instead I wanted to write something more like this:

	r := gin.Default()

	r.LoadHTMLGlob("templates/*")

	r.GET("/", func(c *gin.Context) {
		c.HTML(http.StatusOK, "base.tmpl", gin.H{
			"title":       "IP",
			"curl":        "curl -H \"Content-Type: application/json\" ip.hget.org/api/",
			"curl_result": "{\"ip\":[\"127.0.0.1\"],\"port\":[\"55636\"]}",
		})
	})

That way I could reuse the same base template and then write some Javascript on the front end to build the main body of the page. Add a nginx config that defines a reverse proxy for different subdomains and I could create 3+ websites for the cost of 1 (domains and server bills add up fast).

What did I like about Go? Coming from Python it was refreshing to be able to compile the app, scp it to the server, and execute it. No requirements file. No "pip install -r requirements.txt". No "version 5.3 of x is required but 4.1 is installed" errors. No WSGI. No mucking around with Python versions and virtualenvs (relevant). Because of static typing and a compiler that forces me to write

if err != nil { 
    panic(err) 
}

every time there are very few surprises I would handle errors better than this in a more serious project. Also, having written a fair amount of async code, I can appreciate Goroutines. Having a synchronous API for asynchronous code is in my opinion and in most cases preferable. I liked structs and how they are embeddable with field promotion. Interfaces are great tools for design. In my time poking around the standard lib source code, I found much of it to be extremely concise and readable. It reminded me of what I learned in Patterns of Enterprise Application Architecture. So much so that I question if Python really is the least complex and most readable after all. On the whole, I feel like Go forces me to be a better programmer.


How to Make a Podcast Feed with Django

04/29/2021

Django comes with a syndication framework that lets you create RSS and Atom feeds easily^0. If you already have your model set up all you need to do is subclass the Feed class, add it to your urls.py, and map your model's fields to the XML fields:

class PodcastFeed(Feed):
    copyright_text = "Copyright (C) 2021, In Shape Mind"
    title = "In Shape Mind Podcast"
    link = "/podcast/"
    description = "News articles, transcribed from text, from a variety of quality sources."

    def categories(self):
        return 'news', 'current events', 'trending'

    def feed_copyright(self):
        return self.copyright_text

    def items(self):
        return PodcastArticle.objects.order_by('-id')[:65]

    def item_title(self, item):
        return item.article.title

    def item_description(self, item):
        return item.article.summary

    def item_link(self, item):
        return item.mp3_file.url

    def item_author_name(self, item):
        return item.article.author

    def item_pubdate(self, item):
        return item.article.publish_date

    def item_categories(self, item):
        return item.article.keywords

Since my site^1 already has tons of quality textual data, I thought why not run it through a text-to-speech program and make it a podcast app as well? In order to get this to work I created the PodcastArticle model which has a one-to-one relationship with my Article model. The only thing left to do after that was create the mp3 files with the article text data. The most difficult part was understanding that creating the model so the storage target works in development as well as production is as easy as:

from django.conf import settings
if not settings.DEBUG:
    from inshapemind.storage_backends import PublicMediaStorage


def select_storage():
    return FileSystemStorage() if settings.DEBUG else PublicMediaStorage()


class PodcastArticle(models.Model):
    article = models.OneToOneField('djangonewspaper.NewspaperArticle', null=True, on_delete=models.SET_NULL)
    mp3_file = models.FileField(upload_to='podcasts/', storage=select_storage())

I needed to be selective about which articles I transcribed to audio to keep the quality high, so I just created a celery task and inserted it into the part of my application that searches articles for "trending" keywords (but also filters on completeness of data). So now my celery workers can create a corresponding PodcastArticle object & mp3 file (hosted on CDN) that is added to my Podcast RSS feed. I think I wrote ~ 100 lines of code in all. Cool!


Bitcoin Orderbook with Nodejs and Vuejs

02/13/2021

For this project I wanted to combine several Bitcoin exchange APIs into one order book chart to give a more holistic impression of liquidity. An order book is a cumulative chart of bids/asks volume that slopes downward, then upward from left to right. The inspiration came from me wanting to see the overall order book, which was previously available on data.bitcoinity.org, however their order book chart stopped working for me. So I decided to make one myself. Most exchanges have an order book endpoint publicly available, and the data comes back in a similar structure. Just a little data transformation was needed to map the responses to a common format, and then I could wrangle it all into a chart.js component. I would rather have used python for this part, but Javascript is nice too because of map/filter/reduce functions. I actually preferred this to my usual Django or Flask apps because there is less overhead involved in getting an asynchronous server set up.

I chose to use vue-chartjs because I had used it at work before and had a good experience. A simple vue-chartjs component looks something like this:

Vue-Chartjs component with local data

import { Bar } from 'vue-chartjs'

export default {
  extends: Bar,
  data: () => ({
    chartdata: {
      labels: ['January', 'February'],
      datasets: [
        {
          label: 'Data One',
          backgroundColor: '#f87979',
          data: [40, 20]
        }
      ]
    },
    options: {
      responsive: true,
      maintainAspectRatio: false
    }
  }),

  mounted () {
    this.renderChart(this.chartdata, this.options)
  }
}

I wanted to serve the chart page with the aggregated data local to the chart definition, but it is kind of clunky to render JSON data within a HTML template (toyed around with MustacheJS). I chose to compose my Vue app such that it has an async mounted() method that calls back to the Node server for the aggregated data to populate the chart and render the chart. Otherwise the chart will render with no data because the API call to data is still in transit. However, doing things this way lost me flexibility with declaring options for my chart. The options I set to add axis labels and format the price to a more readable $xx,xxx format do not take effect. If I go against what the documentation says, and pass the options in directly rather than a this.options reference, the axis labels and formatters work, but the rest of the chart breaks.

I developed and tested with a local docker image using docker-compose.yml. I mounted the project folder on the container and configured Nodejs to allow for hot-reloading. That way I could rapidly develop my backend code and see my changes live. For the Vue app all I had to do was run npm run build and add the dist folder to express.static paths and viola, Nodejs/Vuejs app.

Overall the project was fun to work on, I enjoyed getting more exposure to Javascript and its ecosystem. I liked using Nodejs for an API centered project as it's easier to work with JSON data compared to my experience in Python, where I would constantly be doing .get(key, {}).get(key2, None)... However I can't help but feel I'm missing the Javascript equivalent of Pandas in this toolset. This makes me want to wrap a Vuejs app in a Python Flask backend for my next webapp.


Fullstack news aggregator webapp with Django, Postgres, and Docker

12/14/2020

Overview

In this post I will outline my use of the LEPP stack (Linux, Nginx, PostgreSQL, Python) to create a fairly complex web app (inshapemind.com, now defunct). Using these technologies I was able to create a fairly complex web application with relatively little overhead. You may already be familiar with each component, but I will also point out some interesting patterns I've adopted. For those unfamiliar with the stack - Linux is used because it is free and open source, ubiquitous in the cloud, and extremely robust. Nginx is used for those same reasons as well as being performant. PostgreSQL follows the open source trend and also has better security practices. Python is used for access to web frameworks and libraries that can save you a lot of time (Django, requests, pandas, newspaper are some of my favorites).

Goal

My goal was to create a web app that could aggregate content from various news outlets and wrangle the important data into a model defined within Django's ORM. I did this by using what would probably be regarded as an obscene amount of third party libraries (don't reinvent the wheel), and by integrating them with Django's application patterns. Yes I could probably remove half my dependencies by writing a handful of functions with the standard lib, but the goal is a working prototype first and foremost.

On the front end I wanted a responsive, crisp, and readable interface. The most important thing was to present the articles that people want to read, or make them otherwise searchable/browseable. My target was pages with 1 MB or less of data and load times in the 500-750ms range. My goal for the style of the interface was for it to be tolerable (I prefer backend development).

Component Choices

Django

It was either this or Flask. I chose Django since it has an excellent object-relational mapping (ORM), plenty of libraries like allauth, django-rest-framework, and mail/storage providers like anymail and storages. Not having to wrestle with user creation, password validation and storage, verification, forms, etc will literally save me hundreds of hours of headache and provide my users with a better experience and security.

To give you an idea of how simple querying the database in Django is, look at these typical examples:

>>> queryset = MyModel.objects.all()
# or
>>> queryset = MyModel.objects.filter(publish_date__range=[startdate, enddate], language='en', video=False).prefetch_related('author', 'domain').order_by('~publish_date')

The API covers 99% of SQL functionality^0

  • https://django-allauth.readthedocs.io/en/latest/
  • https://www.django-rest-framework.org/tutorial/quickstart/

Newspaper3k

This library allows you to "scrape" or "crawl" websites and extract articles. It has a simple API and makes use of multiprocessing to quickly fetch pages (but not so quickly as to overwhelm the host server). It works by abstracting the DOM and iterating over elements and assigning them scores to decide what text is a part of the article. There are parsers to extract data like title, authors, publish date, and so on. It is not always reliable since every site is different. I had to write some additional parsers for when the data is not extracted. I made a separate Django app based on newspaper, created models for the articles, authors, and domains, and made a management command to run a script that gathers articles from a defined list of websites. So adding more articles looks like:

docker-compose -f production.yml run django python manage.py cron

The management command allows for passing in lists of websites, overriding the default whitelist. That solves the content aspect of the site.

VueJS

I wanted users to be able to do simple things like follow sites and authors, and bookmark articles. I envisioned having a profile page where you could keep track of articles and get a customized feed based on your preferences. The issue with Django is to add/remove an object you typically make a POST request which gets handled by a view and redirects you to a GET request (another database hit). This didn't strike me as very efficient or modern, so I created a REST endpoint and added some VueJS to my templates. VueJS is amazing for applications with data - working with lists and API endpoints is a breeze ^1. You can create a web app entirely with VueJS and serve it as a SPA, or you can add the .js files to your page and make use of its functionality. I feel like I'm cheating a bit here and having my cake while eating it, but it works for now.

PostgreSQL

Postgres was probably the easiest decision to make. It performs. It is secure. But most of all, it has a Django integration ^2. The full text search is perfect for my text heavy application.

Nginx

Nginx hasn't failed me yet. It acts as a reverse proxy, passing requests to the WSGI server (gunicorn). Here is a simplified version of my config:

upstream django {
    server django:5000;
}

server {
    listen 80;
    server_tokens off;
    server_name inshapemind.com;
    return 301 https://inshapemind.com$request_uri;
}

server {

    listen 443 ssl;
    server_name inshapemind.com;
    ssl_certificate /etc/nginx/certs/fullchain.pem;
    ssl_certificate_key /etc/nginx/certs/privkey.pem;

    location / {
        proxy_pass http://django;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_redirect off;
    }
}

It's good practice to use a traditional webserver here because it lets Nginx handle the connections. Usually you would have a line here for serving static assets but I have a CDN set up for that.

Docker

Most people have heard of Docker by now, but I doubt everyone knows the full extent of what it has to offer. Yes, it allows you to run containerized applications, and yes Kubernetes uses it. I like it because it allows you to manage multiple containers and their configuration in a YAML file with docker-compose. Furthermore, you can provision and manage remote servers with docker-machine*. For cloud providers like Digital Ocean who have an API, you can create an instance from the command line by passing in the parameters. The Docker engine will then be installed and you will be able to ssh into the instance simply by invoking docker-machine ssh myproject. Here is what my YAML configuration looks like:

version: '3'

services:
  django: &django
    build:
      context: .
      dockerfile: ./compose/production/django/Dockerfile
    image: inshapemind_production_django
    depends_on:
      - postgres
      - redis
    env_file:
      - ./.envs/.production/.django
      - ./.envs/.production/.postgres
    volumes:
      - newspaper_cache:/tmp/.newspaper_scraper
      - nltk:/root/nltk_data
    expose:
      - 5000
    command: /start

  postgres:
    build:
      context: .
      dockerfile: ./compose/production/postgres/Dockerfile
    image: inshapemind_production_postgres
    volumes:
      - production_postgres_data:/var/lib/postgresql/data
      - production_postgres_data_backups:/backups
    env_file:
      - ./.envs/.production/.postgres

  nginx:
    build:
      context: .
      dockerfile: ./compose/production/nginx/Dockerfile
    image: inshapemind_production_nginx
    depends_on:
      - django
    ports:
      - "0.0.0.0:80:80"
      - "0.0.0.0:443:443"
    volumes:
      - /etc/letsencrypt/live/inshapemind.com/fullchain.pem:/etc/nginx/certs/fullchain.pem:ro
      - /etc/letsencrypt/live/inshapemind.com/privkey.pem:/etc/nginx/certs/privkey.pem:ro

  redis:
    image: redis:5


volumes:
  production_postgres_data: {}
  production_postgres_data_backups: {}
  newspaper_cache: {}
  nltk: {}

Notice how I've easily added volumes for Django and Nginx. Those files are mounted from the host and will persist. The Newspaper cache contains the memoization information so I don't hit the same URL multiple times. I have another container for acquiring a Letsencrypt cert, which is then mounted in the production Nginx container.

Summary

That's my LEPP stack. It is easy to develop and deploy. I hope you learned something new and/or will consider some of these technologies in your next project if you haven't already.

  • docker-machine has been deprecated. Try "Docker Desktop" instead