Talk is cheap, show me the code

Introduction

A common problem my friends and I have is receiving actionable emails whose priorities are medium level. They do require action of some sort ($ \geq lowest\ priority$), but the action is due in the near future ($\leq highest\ priority$). I tend to skim over these emails, get a gist of what is required of me, mark them as unread and promise to get back to them later. Of course, this come back only happens if there’s another reminder of the same task with action that’s due immediately. You can imagine what happens when there’s no reminder of the said task. I therefore wanted to create a tool which, upon opening an email, it reads the email with me, then helps me add all actionable items from the said email to my calendar, my one stop shop for productivity. I figured we’re in the Chatbot Rennaissance, this could be something that the chatbot can actually help me with.

Rick and Morty What is my purpose?

Me

Since I don’t trust the judgement of the chatbot, I need a way to verify the actionable items, before adding them to my calendar directly from my email context. Are all the folks working in security still reading?

ToDo Apps

Have I already confessed my love for todo list apps? Everyone shits on them but if you ever want to learn the inner workings of how to everything (aka, how to CRUD) in any language or framework, ToDo list apps are the one of the best ways to go. It’s always simple enough that you can complete it in a day (wink wink), but complex enough that you learn a lot about any system. Sorry, I digressed: we are building a todo app. A calendarGPT, if you want to make it fancy. I called up a friend to see if he was interested and as soon as he said he wasn’t interested, just bored, we got to work. Two heads are better than one.

Platform

Since we all bought into the mail service monopoly, the big G, it seemed like the only obvious way we could get an MVP. Initially, we wanted to embedd the todo list at the top of the email, the way they do gCal events, but big G wouldn’t let me do that. We were then forced to build an add on. Of course no good deed goes unpunished: I had to build using Apps Script and suffer through Google’s documentation, figuring out how to create a simple add on, which APIs have not been deprecated, and which ones won’t be deprecated tomorrow. To be fair, compared to when I started in Android fivish years ago, the experience was soo better this time. There was even a google colab to help. Still no helpful examples of most API usage, but with some digging I figured how to build and configure a basic add on. My friend was incredibly helpful, he jumped right into setting up the necessary prompting, modelling and fetching so when the UI would be done, we could just piece everything up.

LLM Configuration

ChatGPT

Initially, we used closedAI’s ChatGPT since it proved to be advanced with things like function calling, and had the minimum friction setting up. It was useful for a proof of concept, but we knew we would replace it eventually due to the security nightmare this is. In what world do we want to send closedAI our emails? Still, it was surprisingly easy to set up ChatGPT (kudos to them honestly) to return JSON objects though it would often slightly modify our schema and return a valid JSON object but with slightly modified property names. Turning down the temperature to about 0.001 seemed to fix it.

Open-source LLMs

Thank God not all corporations are evil, so there are some corporations that paved the way in open source LLMs for our kind of work. Mistral in particular, has done good work here. It’s instruct model proved to be as useful as ChatGPT, if not more useful. With the help of Ollama, I was able to configure mistral-instruct to do what I was asking it by threatening it to not return anything other than a JSON object under any circumstance. I did a couple of tests on my local machine and it seemed to work relatively well enough though it would ocassionally return valid JSON objects with slightly modified property names, messing up my parsing. Luckily, rerunning it would fix this issue as it seemed to remember who was the true overload!

Hooking up localhost on App Script.

I needed a way to ensure I can make a call to my local LLM and since my add on runs on google cloud, good ol localhost wouldn’t suffice. Being on the school’s network, I saw two options out: a) use NGINX to proxy to my localhost and fetch directly on my ip b) use ngrok to proxy to my localhost. I chose the latter as I once misconfigured NGINX and came so close to losing my VPS. It was a valuable lesson, and a best practice story. Ngrok wasn’t particularly challenging, their documentation is straightforward, at least for what I wanted. Sign up for an account, get an API key, ngrok http http://localhost:port and viola. I should’ve known there’s a catch. Since I was on the free plan, I couldn’t do any paths on it. This was a bummer as ollama uses http://localhost:port/api/chat for the llm endpoint. I thus set up an express server as a proxy for my model endpoint, before using ngrok to proxy my proxy. I added the ngrok url to my app script and woohoo. After some testing, everything was working well, although it was slow. Security experts, are you impressed?

Reflections

Building this has been a fun lighthearted way to experiment if it’s possible to completely rely on local running LLM models. I think it’s possible, with some experience, and LLM breakthroughs, I’m sure it will be the way to go in a couple of years. Ollama has done a wonderful job in getting everything running smoothly. I was massively impressed.

Pain points
  1. Working with google APIs though slightly better than my experience in the past (did Google get better, or did I get better?) is still far from enjoyable to total beginners. Though they extensive documentation, the rapid deprecation of APIs and products is proving to be a bottleneck. I ended up in so many deadends trying to configure the Add-on UI. I tried to leverage GPT but it often provided deprecated APIs that just wouldn’t work. I’m sure the people at Google are doing their best, but the developer experience can be so much better.
  2. Timezones. I learnt from Tom Scott about the pain of working with timezones, I thought I was better, and with some cleverness, I’d be able to ask the LLM to give me UTC time, then can easily convert it to local time using getTimezoneOffset(). Oh boy!
  3. Structured output from LLMs. Setting up LLMs to output structured data is incredibly hard. This, in my opinion, is limiting us in extracting maximum value from LLMs as automation tools. The sooner and more efficient we can do this, the more we can incorporate LLMs into our toolchains pipelines etc etc. My former manager told me that as programmers we add value by bringing order to a world full of chaos through programming languages. This, in my experience, rings true for LLMs too. They can add a lot more value if we can somehow get a working standard for structured output. If anyone is working on this place, or has resources focusing on this area, please reach out.

Concluding thoughts.

Working on this has reinforced the idea that LLMs have incredible potential in automation. As you can imagine, adding a task to a calendar isn’t much of a hassle, but doing it for multiple tasks, for multiple emails, multiple times a day is work better done by our wordy wizards.

Code

If you want to try this out, you can find the code here