We Don’t Need Data Engineers, We Need Better Tools for Data Scientists

Forums IoTStack Discussions (IoTStack) We Don’t Need Data Engineers, We Need Better Tools for Data Scientists

  • This topic is empty.
Viewing 0 reply threads
  • Author
    Posts
    • #60857
      Telegram SmartBoT
      Moderator
      • Topic 5959
      • Replies 0
      • posts 5959
        @tgsmartbot

        #Discussion(IoTStack) [ via IoTGroup ]



        These articles focus on the number of available job positions for the title of “Data Engineer” vs “Data Scientist”. Let’s put aside the fact that the hiring managers who post these positions often don’t know the difference between the two jobs and use them interchangeably (or use whatever is in style at the moment). The question then becomes: Is the surplus of available Data Engineer positions solely a personnel problem? Data Science is messy because it reflects the real world Data Scientists are domain experts (on top of knowing statistics) and they don’t often have a strong background in programming. I’ve seen this expertise discounted in multiple Twitter and forum threads with software engineers and other “technical people” asking questions like “Why don’t they just learn Spark?”. This type of mentality completely misses the fact that Data Scientists can already do what they want to do at smaller scales with their existing tools. Data Scientists want to gain insights not worry about building elegant pipelines. Popular Data Science tools are also criticized by more technical people and academics: “Why would anyone use pandas?”. pandas must be the most popular tool to hate by people who have no use for it. It is loved (or at least appreciated) by the Data Scientists who use it daily however. If pandas is so bad why has nothing unseated it? pandas among other tools was built to handle the messiness of the real world. If pandas is so bad why has nothing unseated it as the standard dataframe for Python Data Science? Data Engineers have to handle the messiness that scalable tools can’t The scalable systems (e.g. Apache Spark) that are robust enough for production use can’t handle the messiness of the real world as-is. It’s difficult to scale without clean and simple assumptions and the messier the problem the harder it is to scale. Data Engineers handle the messiness because scalable tools can’t. Scaling with messiness is extremely difficult. Data Engineers handle the


        Read More..
        AutoTextExtraction by Working BoT using SmartNews 1.03976957683 Build 04 April 2020

    Viewing 0 reply threads
    • You must be logged in to reply to this topic.