The article barely mentions this, and I haven’t seen comments here mentioning it, but a huge factor in determining identities is one’s writing style. In fact, analyzing the way people speak and write is its own science (linguistic forensics) and is also used by law enforcement (though can realistically be done by anyone with OSINT and basic understanding of individual linguistic patterns.) Dead giveaways are especially if you consistently misspell a certain word or use a certain emoticon or uncommon phrase or word, it’s like a linguistic footprint. If Andy123 on Reddit and XxwhateverxX both spell appearance as appearence and both say booyah and both spell :) as (:, then it is much easier to tell that they may be the same person. This is something that you must be aware of, as well as giving out personal information like country of origin, amount of pets, place of work, etc.
For very long, I have thought vocabulary alone would be enough footprint to ID someone. If you had enough sample of their writing ofc. It’s like browser fingerprints. The words you use, and how often you use them, is a fingerprint. As UnknowableNight points out, some patterns are very unique, nearly enough alone. Yet even without those, you have enough signals. Sentence length. Whether you spell colour or color. Regional expressions. Word use frequency. Whether you bring in vocabulary used mostly in a certain profession, like medicine or law. Whether you use more paragraphs or more single liners. None alone are enough. All together, with the 100 other ones smart people can figure out? Probably enough.
Long ago it would be too much effort, only good for targeted cases. Today? Maybe you can do it dragnet, seeking to ID every person who writes online.
I do not know if that happens today. Yet I do not see anything to stop it.
If you run it locally this would be fine. Mathematically the result would be more like a hash of your writing style; still unique but difficult to determine the origin, y’know?
The article barely mentions this, and I haven’t seen comments here mentioning it, but a huge factor in determining identities is one’s writing style. In fact, analyzing the way people speak and write is its own science (linguistic forensics) and is also used by law enforcement (though can realistically be done by anyone with OSINT and basic understanding of individual linguistic patterns.) Dead giveaways are especially if you consistently misspell a certain word or use a certain emoticon or uncommon phrase or word, it’s like a linguistic footprint. If Andy123 on Reddit and XxwhateverxX both spell appearance as appearence and both say booyah and both spell :) as (:, then it is much easier to tell that they may be the same person. This is something that you must be aware of, as well as giving out personal information like country of origin, amount of pets, place of work, etc.
Excellent point.
For very long, I have thought vocabulary alone would be enough footprint to ID someone. If you had enough sample of their writing ofc. It’s like browser fingerprints. The words you use, and how often you use them, is a fingerprint. As UnknowableNight points out, some patterns are very unique, nearly enough alone. Yet even without those, you have enough signals. Sentence length. Whether you spell colour or color. Regional expressions. Word use frequency. Whether you bring in vocabulary used mostly in a certain profession, like medicine or law. Whether you use more paragraphs or more single liners. None alone are enough. All together, with the 100 other ones smart people can figure out? Probably enough.
Long ago it would be too much effort, only good for targeted cases. Today? Maybe you can do it dragnet, seeking to ID every person who writes online.
I do not know if that happens today. Yet I do not see anything to stop it.
There was a dude on reddit that bigot watch kept nailing because of that.
Dude used the same patterns and words, thinking that switching user names would matter.
If I ever write a manifesto, I’m running it through a jar jar binks and UWU filter first
Damn :3 I am fucked :3
So run everything we say through a LLM and get it to reword it?
If you run it locally this would be fine. Mathematically the result would be more like a hash of your writing style; still unique but difficult to determine the origin, y’know?
No, no, but try to recognise and change some of your writing habits.
But I use ai to write everything online 😀 how will they identify then? 😂