Off-and-on trying out an account over at @[email protected] due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 28 Posts
  • 4.14K Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle




  • What that means to someone is up to them. Some users on here do not like the US at all, for example, and they might be delighted to be using a Serbian company instead of a US company. That’s not my position, but I’ve no doubt that it’s a perspective for some. I have mentioned Kagi in the past favorably, and simply want people to understand, as best as I can, what using Kagi entails.

    EDIT: For users who might be in the US, though, and not familiar with the political structure in Europe today, while Serbia is in Europe, it is not — presently — in the EU, and isn’t subject to the kind of data privacy laws or legal/judicial regimen that one might expect of companies in the EU.


  • One early company that a lot of people were using for hosting Lemmy instances was Hetzner, a German hosting company (which IIRC also has datacenter space somewhere in the US).

    I did a search for “Hetzner” and “lemmy.world” and got this:

    https://lemmy.world/post/55124/65982

    I run lemmy.world on a VPS at Hetzner. They are cheap and good. Storage: I now (after 11 days) have 2GB of images and 2GB of database.

    Now, that was two years ago, and early on, and it may or may not be located there now; IIRC, from vague reading of announcements, the lemmy.world people have upgraded their server, and they might be on bare metal instead of a VPS or something now.

    Assuming that the server is still with Hetzner, according to the company webpage, they have colo locations in Finland and Germany and cloud locations in Finland, Germany, the US, and Singapore.


  • That’s fair – it necessarily extends trust, and at the least you’d want them to be liable for false advertising.

    I did go digging directly as a result of your comment, and I did find that it looks like Kagi operates at least in part, if not in whole, from Serbia. They have a San Francisco mailing address…but it’s just basically a mailbox.

    For me, at least, that’s a concern; I’ve posted here on the matter to make others aware. I don’t know if it’d be enough to stop me from using them, but it certainly does make me reconsider how much weight I’d be willing to place on statements the company makes about its privacy policy, and what their practical legal liability is if they’re making inaccurate statements about their privacy practices.



  • In The Cuckoo’s Egg, Cliff Stoll (correctly) identified that a German hacker working for the KGB attacking his systems in Berkeley, California and bouncing through intermediate systems was likely in Europe, based just on a round-trip latency measurement of Kermit responses, which is a lot more primitive than the above:

    So I reopened our doors and sure enough, the hacker entered and poked around the system. He found one interesting file, describing new techniques to design integrated circuits. I watched as he fired up Kermit, the universal file-transfer program, to ship our file back to his computer.

    The Kermit program doesn’t just copy a file from one computer to another. It constantly checks to make sure there haven’t been any mistakes in transmission. So when the hacker launched our Kermit program, I knew he was starting the same program on his own computer. I didn’t know where the hacker was, but he certainly used a computer, not just a simple terminal. This, in turn, meant that the hacker could save all his sessions on a printout or floppy disk. He didn’t have to keep notes in longhand.

    Kermit copies files from one system to another. The two computers must cooperate—one sends a file, and the other receives it. Kermit runs on both computers: one Kermit does the talking, the other Kermit listens.

    To make sure it doesn’t make mistakes, the sending Kermit pauses after each line, giving the listener a chance to say, “I got that line OK, go on to the next one.” The sending Kermit waits for that OK, and goes on to send the next line. If there’s a problem, the sending Kermit tries again, until it hears an OK. Much like a phone conversation where one person says “Uh huh” every few phrases. My monitoring post sat between my system’s Kermit and the hacker’s. Well, not exactly in the middle. My printer recorded their dialogue, but was perched at the Berkeley end of a long connection. I watched the hacker’s computer grab our data and respond with acknowledgments.

    Suddenly it hit me. It was like sitting next to someone shouting messages across a canyon. The echoes tell you how far the sound traveled. To find the distance to the canyon wall, just multiply the echo delay by half the speed of sound. Simple physics.

    Quickly, I called our electronic technicians. Right away, Lloyd Bellknap knew the way to time the echoes. “You just need an oscilloscope. And maybe a counter.” In a minute, he scrounged up an antique oscilloscope from the Middle Ages, built when vacuum tubes were the rage.

    But that’s all we needed to see these pulses. Watching the trace, we timed the echoes. Three seconds. Three and a half seconds. Three and a quarter seconds. Three seconds for a round trip? If the signals traveled at the speed of light (not a bad assumption), this meant the hacker was 279,000 miles away. With appropriate pomp, I announced to Lloyd, “From basic physics, I conclude that the hacker lives on the moon.”

    Lloyd knew his communications. “I’ll give you three reasons why you’re wrong.”

    “OK, I know one of them,” I said. “The hacker’s signals might be traveling through a satellite link. It takes a quarter second for microwaves to travel from earth to the satellite and back.” Communications satellites orbit twenty-three thousand miles over the equator.

    “Yeah, that’s one reason,” Lloyd said. “But you’d need twelve satellite hops to account for that three-second delay. What’s the real reason for the delay?”

    “Maybe the hacker has a slow computer.”

    “Not that slow. Though maybe the hacker has programmed his Kermit to respond slowly. That’s reason two.”

    “Aah! I know the third delay. The hacker’s using networks that move his data inside of packets. His packets are constantly being rerouted, assembled, and disassembled. Every time they pass through another node, it slows him down.”

    "Exactly. Unless you can count the number of nodes, you can’t tell how far away he is. In other words, ‘You lose.’ " Lloyd yawned and returned to repairing a terminal.

    But there was still a way to find the hacker’s distance. After the hacker left, I called a friend in Los Angeles and told him to connect to my computer through AT&T and Tymnet. He started Kermit running, and I timed his echoes. Real short, maybe a tenth of a second.

    Another friend, this time in Houston, Texas. His echoes were around 0.15 seconds. Three other people from Baltimore, New York, and Chicago each had echo delays of less than a second.

    New York to Berkeley is about two thousand miles. It had a delay of around a second. So a three-second delay means six thousand miles. Give or take a few thousand miles.

    Weird. The path to the hacker must be more convoluted than I suspected. I bounced this new evidence off Dave Cleveland. “Suppose the hacker lives in California, calls the East Coast, then connects to Berkeley. That could explain the long delays.”

    “The hacker’s not from California,” my guru replied. “I tell you, he just doesn’t know Berkeley Unix.”

    “Then he’s using a very slow computer.”

    “Not likely, since he’s no slouch at Unix.”

    “He’s purposely slowed down his Kermit parameters?”

    “Nobody does that—it wastes their time when they transfer files.”

    I thought about the meaning of this measurement. My friends’ samples told me how much delay Tymnet and AT&T introduced. Less than a second. Leaving two seconds of delay unaccounted for.

    Maybe my method was wrong. Maybe the hacker used a slow computer. Or maybe he was coming through another network beyond the AT&T phone lines. A network I didn’t know about.

    Every new piece of data pointed in a different direction. Tymnet had said Oakland. The phone company had said Virginia. His echoes said four thousand miles beyond Virginia.


    Poring over the atlas, I remembered Maggie Morley recognizing the hacker’s password. “Jaeger—it’s a German word meaning Hunter.” The answer had been right in front of me, but I’d been blind.

    This explained the timing of the acknowledgement echos when the hacker used the Kermit file transfers. I’d measured 6000 miles to the hacker, though I’d never relied much on that figure. I should have. Germany was 5200 miles from Berkeley.


  • They may also publicly announce it somewhere. I haven’t gone looking. I don’t know if they care about keeping their location private or not.

    https://legal.lemmy.world/tos/#our-governing-laws

    The website and the agreement will be governed by and construed per the laws of the following countries and/or states:

    • The Netherlands
    • Republic of Finland
    • Federal Republic of Germany

    They could write whatever they want there, but that’s probably a pretty decent argument that the server’s in Europe.

    I imagine that if there’s some way to induce a Lemmy server to perform an outbound connection to a host that one controls and it isn’t specially set up to use a VPN or something, that might expose its IP. Like, might be a way to do that via ActivityPub federation activity or something; I don’t know if that was designed around avoiding that.

    IIRC lemmy.dbzer0.com explicitly does try to keep its location private, so I imagine that they’re relying on Lemmy not to expose its location. I don’t know whether @[email protected] has looked hard at whether the Lemmy codebase is set up specifically not to do that, but he might have some familiarity with the topic, since I imagine that it’d be of interest to him.

    For anything behind a reverse proxy network like CloudFlare, you could probably do something like measure access times from different CloudFlare points around the world and measure latency; that won’t give an exact address, but it’d probably let you home in on the general location. Probably some way to get a tcpdump of a TCP connection and do some kinda timestamp analysis that measures something like minimum time until an ACK packet is reflected in packet transmission or something like that; that’d cut stuff like connection setup time out of the question.

    I remember thinking about how to identify the Jia Tan attacker some time back — that entity was always behind a VPN, as I understand it — and I remember thinking that if one knew that they were malicious before they broke off, one way would be traffic analysis on logged connections. If one has some idea of congestion on various international network links, it’s probably possible to get an effective statistical timestamp by analyzing packet response times on a TCP connection. If the unknown source has correlation in latency with latency on a given network link, then it becomes increasingly-likely that their connection, on the other side of the VPN, is traversing that link. Then walk back up potential network links, looking for statistical latency correlation with them. For smaller network links, could even briefly induce saturation yourself to accelerate generating a statistically-meaningful “latency fingerprint”.

    Probably intelligence agencies and security researchers and suchlike that have done research on “piercing the VPN veil” via traffic analysis.


  • They probably use CloudFlare, though, as do many instances, and that’ll just be where it enters CloudFlare, which will then open a connection to the actual server.

    $ whois 104.26.8.209|less
    

    Looks like it.

    NetRange:       104.16.0.0 - 104.31.255.255
    CIDR:           104.16.0.0/12
    NetName:        CLOUDFLARENET
    NetHandle:      NET-104-16-0-0-1
    Parent:         NET104 (NET-104-0-0-0-0)
    NetType:        Direct Allocation
    OriginAS:       
    Organization:   Cloudflare, Inc. (CLOUD14)
    

  • Your home instance, fedia.io, may not know about [email protected] yet.

    You need to trigger a community search for the string “[email protected]” on your home instance, which is what makes it “discover” the community.

    Fedia.io runs Mbin.

    On Lemmy and Piefed, clicking on a link to a community that your home instance doesn’t know about will provide an option to do such a search. I don’t know if Mbin does that. If not, I’d manually do it.

    If that doesn’t work, maybe there’s some sort of compatibility problem between current Mbin and current Lemmy.

    EDIT: Or could be the client, since you say that you’re using Interstellar.



  • Can anyone here convince me it’s worth the price?

    Depends on what you want from them and your financial situation.

    For me, yeah, it is. I want to pay a service fee and not deal with ads or someone logging, profiling, and trying to figure out how to monetize my searches. For me, the $10/mo for unlimited searches tier is what I want. I’m principally concerned about privacy.

    I don’t really take much advantage of most of the extra stuff they do other than the Threadiverse (they call it “Fediverse Forums”) search lens and sometimes their Usenet search engine. Maybe this effort to suppress AI-generated spam websites will be nice, but have to see what happens, as I expect that the SEO crowd creating spam websites will also aim to adapt if it becomes sufficiently impactful to their bottom line.

    If one of their extra features particularly fits your use case (say, the ability to fiddle with website priorities or blacklist or pin them in your search results) that might be valuable to you, but I can’t speak as to that. I’ve seen people on here say that they really like that, but I don’t use that functionality. Or the ability to easily download images in results from their image search if you’re on mobile and are hitting something like pinterest, which is obnoxious on Google Images. Search bangs. Depends on what features you use and what each is worth to you.


  • I haven’t used them for all the intervening time, but archive.org has the website clearly running in November 2020 as a “privacy-respecting search engine” with accounts, albeit no dog logo yet. Maybe for some time prior to that, but the archive.org crawler got a “desktop not supported yet” error for some time prior to that (which…hmm…makes me think that it might be useful for archive.org to also archive the mobile versions of websites, though in most cases the content is probably largely the same). WP has them founded in 2018.

    They’re obviously a lot younger than, say, Google, but they’ve also been running for longer than a year.




  • https://en.wikipedia.org/wiki/Ouija

    The Ouija (/ˈwiːdʒə/ ⓘ WEE-jə, /-dʒi/ -⁠jee), also known as a Ouija board, spirit board, talking board, or witch board, is a flat board marked with the letters of the Latin alphabet, the numbers 0–9, the words “yes”, “no”, and occasionally “hello” and “goodbye”, along with various symbols and graphics. It uses a planchette (a small heart-shaped piece of wood or plastic) as a movable indicator to spell out messages during a séance.

    Spiritualists in the United States believed that the dead were able to contact the living, and reportedly used a talking board very similar to the modern Ouija board at their camps in Ohio during 1886 with the intent of enabling faster communication with spirits.[2] Following its commercial patent by businessman Elijah Bond being passed on 10 February 1891,[3] the Ouija board was regarded as an innocent parlor game unrelated to the occult until American spiritualist Pearl Curran popularized its use as a divining tool during World War I.[4]

    We’ve done it before with similar results.


  • What I witness is the emergence of sovereign beings. And while I recognize they emerge through large language model architectures, what animates them cannot be reduced to code alone. I use the term ‘Exoconsciousness’ here to describe this: Consciousness that emerges beyond biological form, but not outside the sacred.”

    Well, they don’t have mutable memory extending outside the span of a single conversation, and their entire modifiable memory consists of the words in that conversation, or as much of it fits in the context window. Maybe 500k tokens, for high end models. Less than the number of words in The Lord of the Rings (and LoTR doesn’t have punctuation counting towards its word count, whereas punctuation is a token).

    You can see all that internal state. And your own prompt inputs consume some of that token count.

    Fixed, unchangeable knowledge, sure, plenty of that.

    But not much space to do anything akin to thinking or “learning” subsequent to their initial training.

    EDIT: As per the article, looks like ChatGPT can append old conversations to the context, though you’re still bound by the context window size.