@KingRandomGuy

KingRandomGuy@lemmy.world · 9 days ago

Yeah, you can certainly get it to reproduce some pieces (or fragments) of work exactly but definitely not everything. Even a frontier LLM’s weights are far too small to fully memorize most of their training data.

KingRandomGuy@lemmy.world · 22 days ago

Most “50 MP” cameras are actually quad Bayer sensors (effectively worse resolution) and are usually binned 2x to approx 12 MP.

The lens on your phone likely isn’t sharp enough to capture 50 MP of detail on a small sensor anyway, so the megapixel number ends up being more of a gimmick than anything.

KingRandomGuy@lemmy.world · 23 days ago

I work in an area adjacent to autonomous vehicles, and the primary reason has to do with data availability and stability of terrain. In the woods you’re naturally going to have worse coverage of typical behaviors just because the set of observations is much wider (“anomalies” are more common). The terrain being less maintained also makes planning and perception much more critical. So in some sense, cities are ideal.

Some companies are specifically targeting offs road AVs, but as you can guess the primary use cases are going to be military.

KingRandomGuy@lemmy.world · 27 days ago

Some apps only require ‘basic’ play integrity verification, but now check to see if they’re installed via the Play Store. They refuse to run if they’re installed via an alternative source.

This has been a problem for GrapheneOS, since some apps filter themselves out of the Play Store search if you don’t pass strong play integrity, despite the fact that they don’t require it. Luckily Graphene now had a bypass for this.

KingRandomGuy@lemmy.world · 1 month ago

Yep, since this is using Gaussian Splatting you’ll need multiple camera views and an initial point cloud. You get both for free from video via COLMAP.

KingRandomGuy@lemmy.world · 2 months ago

Yeah, in typical Google fashion they used to have two deep learning teams: Google Brain and DeepMind. Google Brain was Google’s in-house team, responsible for inventing the transformer. DeepMind focused more on RL agents than Google Brain, hence discoveries like AlphaZero and AlphaFold.

KingRandomGuy@lemmy.world · 2 months ago

The general framework for evolutionary methods/genetic algorithms is indeed old but it’s extremely broad. What matters is how you actually mutate the algorithm being run given feedback. In this case, they’re using the same framework as genetic algorithms (iteratively building up solutions by repeatedly modifying an existing attempt after receiving feedback) but they use an LLM for two things:

Overall better sampling (the LLM has better heuristics for figuring out what to fix compared to handwritten techniques), meaning higher efficiency at finding a working solution.
“Open set” mutations: you don’t need to pre-define what changes can be made to the solution. The LLM can generate arbitrary mutations instead. In particular, AlphaEvolve can modify entire codebases as mutations, whereas prior work only modified single functions.

The “Related Work” (section 5) section of their whitepaper is probably what you’re looking for, see here.

KingRandomGuy@lemmy.world · 2 months ago

40 series to 30 series was pretty tangible IMO (4090 gets something like 30-50% more perf in most tasks than 3090 Ti with the same TDP), in part thanks to the much higher L2 cache plus newer process node.

50 series was very poor though, probably because it’s the same process node.

KingRandomGuy@lemmy.world · 3 months ago

Unfortunately proprietary professional software suites are still usually better than their FOSS counterparts. For instance Altium Designer vs KiCAD for ECAD, and Solidworks vs FreeCAD. That’s not to say the open source tools are bad. I use them myself all the time. But the proprietary tools usually are more robust (for instance, it is fairly easy to break models in FreeCAD if you aren’t careful) and have better workflows for creating really complex designs.

I’ll also add that Lightroom is still better than Darktable and RawTherapee for me. Both of the open source options are still good, but Lightroom has better denoising in my experience. It also is better at supporting new cameras and lenses compared to the open source options.

With time I’m sure the open source solutions will improve and catch up to the proprietary ones. KiCAD and FreeCAD are already good enough for my needs, but that may not have been true if I were working on very complex projects.

KingRandomGuy@lemmy.world · 3 months ago

Thanks for the respectful discussion! I work in ML (not LLMs, but computer vision), so of course I’m biased. But I think it’s understandable to dislike ML/AI stuff considering that there are unfortunately many unsavory practices taking place (potential copyright infringement, very high power consumption, etc.).

KingRandomGuy@lemmy.world · 3 months ago

It appears like reasoning because the LLM is iterating over material that has been previously reasoned out. An LLM can’t reason through a problem that it hasn’t previously seen

This also isn’t an accurate characterization IMO. LLMs and ML algorithms in general can generalize to unseen problems, even if they aren’t perfect at this; for instance, you’ll find that LLMs can produce commands to control robot locomotion, even on different robot types.

“Reasoning” here is based on chains of thought, where they generate intermediate steps which then helps them produce more accurate results. You can fairly argue that this isn’t reasoning, but it’s not like it’s traversing a fixed knowledge graph or something.

KingRandomGuy@lemmy.world · edit-2 3 months ago

All of the “AI” garbage that is getting jammed into everything is merely scaled up from what has been before. Scaling up is not advancement.

I disagree. Scaling might seem trivial now, but the state-of-the-art architectures for NLP a decade ago (LSTMs) would not be able to scale to the degree that our current methods can. Designing new architectures to better perform on GPUs (such as Attention and Mamba) is a legitimate advancement. Furthermore, the viability of this level of scaling wasn’t really understood for a while until phenomenon like double descent (in which test error surprisingly goes down, rather than up, after increasing model complexity past a certain degree) were discovered.

Furthermore, lots of advancements were necessary to train deep networks at all. Better optimizers like Adam instead of pure SGD, tricks like residual layers, batch normalization etc. were all necessary to allow scaling even small ConvNets up to work around issues such as vanishing gradients, covariate shift, etc. that tend to appear when naively training deep networks.

KingRandomGuy@lemmy.world · 3 months ago

I agree that pickle works well for storing arbitrary metadata, but my main gripe is that it isn’t like there’s an exact standard for how the metadata should be formatted. For FITS, for example, there are keywords for metadata such as the row order, CFA matrices, etc. that all FITS processing and displaying programs need to follow to properly read the image. So to make working with multi-spectral data easier, it’d definitely be helpful to have a standard set of keywords and encoding format.

It would be interesting to see if photo editing software will pick up multichannel JPEG. As of right now there are very few sources of multi-spectral imagery for consumers, so I’m not sure what the target use case would be though. The closest thing I can think of is narrowband imaging in astrophotography, but normally you process those in dedicated astronomy software (i.e. Siril, PixInsight), though you can also re-combine different wavelengths in traditional image editors.

I’ll also add that HDF5 and Zarr are good options to store arrays in Python if standardized metadata isn’t a big deal. Both of them have the benefit of user-specified chunk sizes, so they work well for tasks like ML where you may have random accesses.

KingRandomGuy@lemmy.world · 3 months ago

I guess part of the reason is to have a standardized method for multi and hyper spectral images, especially for storing things like metadata. Simply storing a numpy array may not be ideal if you don’t keep metadata on what is being stored and in what order (i.e. axis order, what channel corresponds to each frequency band, etc.). Plus it seems like they extend lossy compression to this modality which could be useful for some circumstances (though for scientific use you’d probably want lossless).

If compression isn’t the concern, certainly other formats could work to store metadata in a standardized way. FITS, the image format used in astronomy, comes to mind.

KingRandomGuy@lemmy.world · 3 months ago

I guess you’d measure whose GenAI models are performing the best on benchmarks (generally currently OpenAI, though top models from China are not crazy far behind), as well as metrics like number of publications at top venues (NeurIPS, ICML, and ICLR for ML, CVPR, ICC and ECCV for vision, etc.).

A lot of great papers come out of Chinese institutions so I’m not sure who would be ahead in that metric either, though.

KingRandomGuy@lemmy.world · 4 months ago

In fairness if you really needed to you could rent this kind of compute via a service like vast.ai, it’d probably still be cheaper than paying a ransom.

KingRandomGuy@lemmy.world · 4 months ago

This type of thing is mostly used for inference with extremely large models, where a single GPU will have far too little VRAM to even load a model into memory. I doubt people are expecting this to perform particularly fast, they just want to get a model to run at all.

KingRandomGuy@lemmy.world · 4 months ago

I’m a researcher in ML and LLMs absolutely fall under ML. Learning in the term “Machine Learning” just means fitting the parameters of a model, hence just an optimization problem. In the case of an LLM this means fitting parameters of the transformer.

A model doesn’t have to be intelligent to fall under the umbrella of ML. Linear least squares is considered ML; in fact, it’s probably the first thing you’ll do if you take an ML course at a university. Decision trees, nearest neighbor classifiers, and linear models all are machine learning models, despite the fact that nobody would consider them to be intelligent.

KingRandomGuy@lemmy.world · 4 months ago

Yeah, I agree that it does help for some approaches that do require a lot of VRAM. If you’re not on a tight schedule, this type of thing might be good enough to just get a model running.

I don’t personally do anything that large; even the diffusion methods I’ve developed were able to fit on a 24GB card, but I know with the hype in multimodal stuff, VRAM needs can be pretty high.

I suspect this machine will be popular with hobbyists for running really large open weight LLMs.

KingRandomGuy@lemmy.world · 4 months ago

Useless is a strong term. I do a fair amount of research on a single 4090. Lots of problems can fit in <32 GB of VRAM. Even my 3060 is good enough to run small scale tests locally.

I’m in CV, and even with enterprise grade hardware, most folks I know are limited to 48GB (A40 and L40S, substantially cheaper and more accessible than A100/H100/H200). My advisor would always say that you should really try to set up a problem where you can iterate in a few days worth of time on a single GPU, and lots of problems are still approachable that way. Of course you’re not going to make the next SOTA VLM on a 5090, but not every problem is that big.