Integrating Letta with a recipe manager

This week's blog post is going to be technical.

I mentioned before that I was using Paprika Recipe Manager. I had issues with it, so I moved to Mealie, an open source recipe manager with a Swagger API.

Mealie is pretty great for me, and I might write a blog post about it later. However, the real fun is using Letta to manage it.

Letta and Mealie

One thing that Letta gets right is the function integration. Instead of having to use Langchain, I can go to the ADE and paste in some Python, then test it out until it works. There's three different views of conversation, showing function calling behavior in more detail, so debugging functions is very easy to do.

Mealie has POST /api/recipes/create/url, so with this, Letta can search using Tavily, then use the URL to create a recipe in Mealie.

Add Recipe From URL

Then I wanted it to be able to find recipes directly from Mealie. This is two calls: first query for the general description:

Find Recipes

and then get the recipe using the slug:

Get Recipe

After that, I started getting fancy. I noticed that imported recipes were not coming in with appropriate categories and tags. A few more functions to update tags, update categories, and add notes.

Update Tags

And finally, I wanted to be able to read the meal plans.

Read meal plans

This was a pretty good week.

Letta's Stateful Providers

I had a conversation on Letta's discord about the problems I was having using LiteLLM as a provider to Letta. Unsurprisingly, adding state to a stateless system is complicated.

totally agreed that it's standard, however it's important to note that it's very much an open quetsion of how you implement it properly because letta is stateful - all the agents retain state and memory
and the openai chat completions endpoint is stateless and you're expected to pass in the entire conversation history each time
so you have a bunch of implementation details that you need to handle, like should you expect the client is stateless and passes everything every time?
that means then on the server side, you need to intelligently dedupe the inputs
or do you assume that the client knows the chatcompletions endpoint isn't true chatcompletions, and it's stateful if so, then you need to hijack one of the inputs on the endpoint to serve as a "cookie" / session ID
to understand which session / state object the request is coming from
(just trying to explain a bit more why you can't just slap an chatcompletions endpoint on a letta server and call it a day, no matter what you will need to modify the client application or server to handle the issue that these are fundamentally very different APIs, stateful vs stateless)
the letta api is intended to be as close as possible (at least the inputs) to the chatcompletions api for familiarity reasons, eg screenshot from here
main thing to keep in mind when using something like litellm is how they remap off of openai style for tool calling eg in our own AnthropicProvider code, we do things like inject the tag into content (for all past reasoning), and also inject an assistant prefill with , but only for haiku and sonnet, but not opus because opus doesn't support it (when using tool calling) this is something that definitely will not happen for the litellm proxy version of anthropic models but i haven't tested litellm anthropic heavily so i wouldn't be able to tell you if their version of remapping works OK or not it's the same thing with openrouter to a large extent

The documentation reflects this statement on OpenAI Proxy:

"OpenAI proxy endpoints are not officially supported and you are likely to encounter errors. We strongly recommend using providers directly instead of via proxy endpoints (for example, using the Anthropic API directly instead of Claude through OpenRouter)."

So the upshot is that models have to added directly to Letta to be managed correctly. Using a proxy will result in something being lost in translation. And if a provider isn't listed, like Lambda Labs, it's very likely not to work at all.

Even Groq has streaming issues:

Mar 01 14:28:18 letta letta-server /app/letta/server/rest_api/utils.py:123: UserWarning: SSE stream generator failed: Error code: 500 - {'error': {'message': 'Internal Server Error', 'type': 'internal_server_error'}}

So choices are limited.

Context Window Issues

One surprising detail is that using the Anthropic provider, the context window is hardcoded to 200000. This is the absolute max limit, but it turns out that you can't use all of that. On a basic Tier 1, if you send more than 40K input tokens at once, you'll run into the rate limit.

Which, eventually, is what happened.

Feb 27 11:46:54 letta letta-server anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'This request would exceed the rate limit for your organization (68ff39e7-c1fc-48a9-a73b-1bc6dda2daed) of 40,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}

I wasn't doing anything fancy when I hit that error message. I just hadn't flushed my message history in a while. In spite of Letta being stateful, it was still sending the entire message history on every request. This is apparently a common problem when using Jupyter Notebooks, but the upshot is that a large context window is expensive and you should only make it as large as you need to for your job.

A large context window will also quickly exhaust your service limits if you're using a free service, like Groq's free tier, because the rate limits are extremely low.

Unfortunately, Letta only shows a generic error message when rate limits are exceeded, rather than a specific 429 Too Many Requests error. This is where Papertrail comes in very handy.

Ollama Issues

I had more success using Letta with the Ollama provider.

It is remarkable to see how quickly the models fail at function calling. To watch Youtube, you'd think that function calling was a solved issue, but I found it was very hit or miss with a number of models. Even Claude was not fully compliant with function passing – if I had a categories: list[str] then Claude would still pass in Thai, Main Dish, Quick as CSV. A large number of models would fail to recognize the optional tag on an argument, and would explicitly pass in null on arguments. Likewise, models dislike structured output like JSON; they like working with text, especially Markdown. They're much happier with a straight Slug: {slug} over a {"slug": slug}.

I alluded to the local models being a bit dumb, but I didn't provide any examples. I find it a bit frustrating that so many models love to tell you what they can do and how high they rate on scoreboards instead of telling you what straightforward tasks they can do.

So here's a test – can the model figure out that find me a chili recipe from Mealie means "call find_recipes_in_mealie with name = "%chili%" as the search term?"

Hermes3-Llama3.2 called the conversation archives twice:

Hermes3 Llama 3.2

Letta-o1 got closer, calling Tavily:

Letta-o1

Hermes Llama 3.1 got closer still, calling the right method, but called the search term in single quotes instead of double quotes, which is still reasonable. Less reasonably, it then took the empty text as a chili recipe.

Hermes3 Llama 3.1

Dolphin Mistral tried several things at once, and also didn't get the search term right:

Dolphin Mistral

Gemma 2 got the right function, but again couldn't deal with the SQL search term. I felt at this point I'd made the problem too hard, so I did the SQL term for it. It then managed to get the recipe with some prompting, and after that it was fine.

Gemma 2

It's tough to unit test LLMs, or hold them to a consistent standard, but generally I wouldn't trust these models to do anything I wasn't carefully monitoring. Local models are still very much at the "ask a five year old to set the table" stage of jobs they can do reliably – you have to be specific, you have to check their work, and you have to give them one thing to do.

Comments