Hi—I have some free Azure credits and would like to use them to host a personal Lemmy instance. I know Lemmy is containerised, but is there a preferred choice for hosting in Azure—AKS, Azure Container Apps, Container Instances? Also, any guidance on appropriate PostgreSQL configuration—I know there are some options around that.

Also, can anyone point me at what resource utilisation will look like for a Lemmy instance—I imagine disk space is more of a concern that compute usage.

  • ubergeek77A
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 year ago

    Will you only be supporting yourself and maybe a small subset of users? If you don’t need your instance to scale, you can (shameless self plug) try my deployment script to get yourself running.

    It just uses the recommended Postgres configuration as seen in the deployment files in Lemmy’s official repo. It would just be in a Docker volume on disk, so if you had thoughts of scaling in the future, and wanted to use a managed Postgres service, I would not recommend using my script.

    I run an instance just for myself, CPU resources are so low that pretty much anything you can get in the cloud will be good. Disk space is a much more important factor. In terms of just Lemmy-created data, my personal 10-day instance has stored about 6.2GB of data. 2.4GB of this is just thumbnails. Note that this does not include other things that consume resources, such as my Docker images or my Docker build cache, which I clear manually.

    So, that is roughly 640MB of new data generated per day. Your experience will vary depending on how many communities you subscribe to, but that’s a good rough estimate. Round it up to 700MB to have a safer estimate. But remember, this is with Lemmy’s current rate of activity. If the amount of posts and comments doubles, triples in the future, my storage requirements will likely go up considerably.

    I am genuinely not sure what long-term Lemmy maintenance looks like in terms of releasing disk space. I can clear my thumbnail data and be fine, but I wonder what’s going to happen with the postgres database. Is there some way to prune old data out of it to save space? Will my cloud storage costs become so unreasonable in a year, that I’ll have to stop hosting Lemmy? These are the questions I don’t have answers to yet.

    If there is something clever you can do to plan ahead and save yourself disk space costs in the future (like, are managed Postgres services cheaper to host than on disk ones?), I’d recommend doing that.


    EDIT: Turns out ~90% of my Lemmy data is just for debugging and not needed:

    https://github.com/LemmyNet/lemmy/issues/3103#issuecomment-1631643416

    • r0bbbo@programming.devOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Thanks for the great reply—I’ll take a look at your deployment script to see if that fits my needs. I only plan to use the instance for me and a handful of friends. Like you say, data retention is probably my biggest concern so I’ll look at the most sensible way to budget for that in Azure. Are there any numbers available from the major Lemmy instances? Consideration for retention policies seem like a bit of an oversight—I might do some reading to see what the plan is here.

      • ubergeek77A
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        I’m not sure if the other instances have published their numbers, I can only see what my Docker volumes look like.

        But if it helps you plan, you should know that federation only involves new data. When you set up a new instance, and federate with/subscribe to a community, it will only fetch an initial 20 posts (if that). From that point forward, you will receive a copy of all posts/comments posted to that community, but you will not have anything from before you federated. So you don’t have to worry about mirroring the entirety of a community’s history - I’d probably be out of disk space 3 times over if that were the case.

        There are ways for users to retrieve “old” posts, but it’s done on an individual basis, not in bulk.

    • embix@feddit.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      In terms of just Lemmy-created data, my personal 10-day instance has stored about 6.2GB of data

      260 GiB/a is a magnitude more than I anticipated.

      Will my cloud storage costs become so unreasonable in a year, that I’ll have to stop hosting Lemmy?

      Exactly what I’m thinking right now. And if I subscribe to a sub that get’s really popular and it really fetches every post, I might be busy attaching spinning rust to a logical volume since SSDs aren’t that cheap. OTOH it only needs to store everything I see on other instances - maybe that’s configurable. I haven’t looked into the actual code yet, but that would seem like a reasonable use case for a single user instance.