I used to be the Security Team Lead for Web Applications at one of the largest government data centers in the world but now I do mostly “source available” security mainly focusing on BSD. I’m on GitHub but I run a self-hosted Gogs (which gitea came from) git repo at Quadhelion Engineering Dev.

Well, on that server I tried to deny AI with Suricata, robots.txt, “NO AI” Licenses, Human Intelligence (HI) License links in the software, “NO AI” comments in posts everywhere on the Internet where my software was posted. Here is what I found today after having correlated all my logs of git clones or scrapes and traced them all back to IP/Company/Server.

Formerly having been loathe to even give my thinking pattern to a potential enemy I asked Perplexity AI questions specifically about BSD security, a very niche topic. Although there is a huge data pool here in general over many decades, my type of software is pretty unique, is buried as it does not come up on a GitHub search for BSD Security for two pages which is all most users will click, is very recent comparitively to the “dead pool” of old knowledge, and is fairly well recieved, yet not generally popular so GitHub Traffic Analysis is very useful.

The traceback and AI result analysis shows the following:

  1. GitHub cloning vs visitor activity in the Traffic tab DOES NOT MATCH any useful pattern for me the Engineer. Likelyhood of AI training rough estimate of my own repositories: 60% of clones are AI/Automata
  2. GitHub README.md is not licensable material and is a public document able to be trained on no matter what the software license, copyright, statements, or any technical measures used to dissuade/defeat it. a. I’m trying to see if tracking down whether any README.md no matter what the context is trainable; is a solvable engineering project considering my life constraints.
  3. Plagarisation of technical writing: Probable
  4. Theft of programming “snippets” or perhaps “single lines of code” and overall logic design pattern for that solution: Probable
  5. Supremely interesting choice of datasets used vs available, in summary use, but also checking for validation against other software and weighted upon reputation factors with “Coq” like proofing, GitHub “Stars”, Employer History?
  6. Even though I can see my own writing and formatting right out of my README.md the citation was to “Phoronix Forum” but that isn’t true. That’s like saying your post is “Tick Tock” said. I wrote that, a real flesh and blood human being took comparitvely massive amounts of time to do that. My birthname is there in the post 2 times [EDIT: post signature with my name no longer? Name not in “about” either hmm], in the repo, in the comments, all over the Internet.

[EDIT continued] Did it choose the Phoronix vector to that information because it was less attributable? It found my other repos in other ways. My Phoronix handle is the same name as GitHub username, where my handl is my name, easily inferable in any, as well as a biography link with my fullname in the about.[EDIT cont end]

You should test this out for yourself as I’m not going to take days or a week making a great presentation of a technical case. Check your own niche code, a specific code question of application, or make a mock repo with super niche stuff with lots of code in the README.md and then check it against AI every day until you see it.

P.S. I pulled up TabNine and tried to write Ruby so complicated and magically mashed, AI could offer me nothing, just as an AI obsucation/smartness test. You should try something similar to see what results you get.

  • Elias Griffin@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    5 months ago

    Discussion Primer: From my perspective and potential millions of others, the readme is part of the software, it is delivered with the software whether zip, tar, git. Itself, Markdown is a specifiction and can be consider the document as software.

    In fact README is so integral to the software you cannot run the software without it.

    Conclusion: I think we all think of readme, especially ones with examples of your code in your readme, as code. I have evidence AI trains on your README even if you tell it specifally not to use readme, block readme, block markdowns, it still goes after it. Kinda scary?

    I want everyone else to have the evidence I have, Science.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      I mean this in the best possible way, but have you ever had any mental health evaluations? I’m not sure if they’re still calling it paranoid schizophrenia, but the way you write makes me concerned.

      • Elias Griffin@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        edit-2
        5 months ago

        I write the smartest in the room, passionate, with wisdom and evidence. The way you defame someone like this makes me definitely sure you are not afraid to defame someone’s character with no evidence of anything but your own stupidity and un-awareness.

        • subignition@fedia.io
          link
          fedilink
          arrow-up
          1
          ·
          5 months ago

          I think your problem is here:

          You should test this out for yourself as I’m not going to take days or a week making a great presentation of a technical case.

          You’ve written a whole lot to try to be convincing but ultimately stopped short of actually proving what you’ve alleged. It looks to me you are frustrated that no one is taking you at your word and going down this rabbit hole themselves, when the various reputational elements you’re relying on are going to be important only to a minority of users. Burden of proof works how it always has, however.

        • catloaf@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          ·
          5 months ago

          This is out of genuine concern, my dude. Your other comment accusing me of not being a real person is positively alarming.

          • Elias Griffin@lemmy.worldOP
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            edit-2
            5 months ago

            Your rapacious backwards insult of caring is gross and obvious. You called me “my dude” like a teenger whose chill, and calm, and correct, but just …a child and wrong in the end. How old are you child? My Lemmy profile is my name with my Seal naturally born March 4th, 1974 as Elias Christopher Griffin. I’ve done more in my life than most people do in 10. My mental health is top 3% as is my intellect.

            You are an un-named rando lemmy account named “catloaf” who averages 16 posts a day for the past 4 months with no original posts of your own because you aren’t original.

            I make only original posts. You seem nothing like a real person. Want to tell us who you are? What makes you special, outside of the mandated counseling you recieve or data models you intake?

            You know what, no one takes what you say seriously loaf of cat, I certainly didn’t, don’t, and won’t. Here is space for your next hairball


            • subignition@fedia.io
              link
              fedilink
              arrow-up
              1
              ·
              5 months ago

              I take back the benefit of the doubt I gave in my earlier reply. This reply is as unhinged as the Navy SEAL copypasta. You need mental health support.

            • DudeDudenson@lemmings.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 months ago

              This really reads like copy pasta, if someone told me you were an LLM configured to make antiAI people look bad I’d believe them