• 2 Posts
  • 99 Comments
Joined 2 年前
cake
Cake day: 2023年6月23日

help-circle


  • Are there existing tools you love (or hate) that do something similar?

    This sounds similar to “Static code analysis” tools. Especially now that these code analysis tools are getting AI integrations.

    For example we use coderabbit.ai. That does a code review on PRs in github, and reviews these sort of things. Especially the simpler things that you’ve mentioned like poor naming conventions, violations of language-specific best practices, and readability issues. I’m not sure if it will automatically come up with “large refactoring opportunities” by default - but maybe you can custom-prompt configure it to try, I guess

    (Comment) Why have a separate webpage if such of helper can be built into IDE/editor?

    Coderabbit also has IDE extensions: https://www.coderabbit.ai/ide - I think the separate webpage exists for org level configurations and overviews. These “best practices” are probably defined on a team level to ensure everyone uses the same code-style and things like that

    I’m not sure if “just a website to copypaste code and get reviews” is really a good idea. Maybe for juniors that want to review one class or method or something. But usually code is spread across multiple files, and structural refactor opportunities are on a larger scale then just a couple files


  • On September 19, Ruby Central, a nonprofit organization that manages RubyGems.org, a platform for sharing Ruby code and libraries, asserted control over several GitHub repositories for Ruby Gems as well as other critical Ruby open source projects that the rest of the Ruby development community relies on.

    Uhm, so how does this happen? If some people create Ruby Gems and host them under their own github account, how would Ruby Central suddenly assert control over them?





  • Many people believe that the ToS was added to make Mozilla legally able to train AIs on the collected data.

    “Don’t attribute to malice what is easily explained by incompetence”

    So yea Mozilla wrote some terms that where ambiguous and could be interpreted in different ways, and ‘many people believed’ that they did this intentionally and had the worst intentions possible by their interpretation of the new ToS

    Then Mozilla rewrote that ToS after seeing how people were interpreting the original ToS:
    https://www.theverge.com/news/622080/mozilla-revising-firefox-terms-of-use-data

    And yea, now ‘many people will believe’ that ‘Mozilla revised their decision to do this after the backslash’ - OR, it was never their intention and now phrased it better after the confusion

    People just want to get their pitchforks out and start drama at any possible opportunity without evidence of wrongdoing… Mozilla added stupid stuff to the ToS, ok yea fair enough - but if they actually did “steal user data” - this would be very easily detectable with Wireshark or something




  • Also some feedback, a bit more technical, since I was trying to see how it works, more of a suggestion I suppose

    It looks like you’re looping through the documents and asking it for known tags, right? ({str(db.current_library.tags)}.)

    I don’t know if I would do this through a chat completion and a chat response, there are special functions for keyword-like searching, like embeddings. It’s a lot faster, and also probably way cheaper, since you’re paying barely anything for embeddings compared to chat tokens

    So the common way to do something like this in AI would be to use Vectors and embeddings: https://platform.openai.com/docs/guides/embeddings

    So - you’d ask for an embedding (A vector) for all your tags first. Then you ask for embeddings of your document.

    Then you can do a Nearest Neighbor Search for the tags, and see how closely they match



  • It gives an example:

    For example, with the phrase “My favorite tropical fruits are __.” The LLM might start completing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and each token is given a probability score. When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.

    So I suppose with a larger text, if all lists of things are “LLM Sorted”, it’s an indicator.

    That’s probably not the only thing, if it can detect a bunch of these indicators, there’s a higher likelihood it’s LLM text



  • Since others already suggested mostly on-topic suggests, here’s an alternative suggestion:

    Instead of looking specifically for a mentor - look for an open source project that you can help with. Ideally one with a discord or something to it’s easy to be in contact the the lead dev. A lot people don’t mind mentoring juniors, but in my experience it doesn’t happens that explicitly - “be my mentor” - and it might sound like you’re asking them a lot.

    If you invert it into “Hey I wanna help you with your open-source project, but I don’t really know what to do, what your expectations are, how to implement a specific feature” - then you’re offering to do work them, instead of asking for something. And implicitly you’ll get mentorship in return.

    And “real” projects probably also look better on your github / portfolio than only some dummy projects for learning purposes






  • Omg it’s sooo daammmn slooow it takes around 30 seconds to bulk - insert 15000 rows

    Do you have any measurements on how long it takes when you just ‘do it raw’? Like trying to do the same insert though SQL Server Management Studio or something?

    Because to me it’s not really clear what’s slow. Like you’re complaining specifically about the Microsoft ODBC driver - but do you base that on anything? Can you insert faster from Linux or through other means?

    Like if it’s just ‘always slow’ it might just be the SQL Server. If you can better pinpoint when it’s slow, and when it’s fast(er) that probably helps to tell how to speed it up