In this special Christmas time we got more gifts for you :) One of them is an interview with Agnieszka Słowik, who is passionate expert in Data Science. That’s why we took a chance and decided to explore it with her. We hope that it will inspire you and encourage to learn more. Dive into adventure called Data Science we prepared especially for you. Have fun :)
Ewelina Wołoszyn: Hi Agnieszka, it is great to have an opportunity to talk with you :) Christmas time. Let’s start. Can you tell us what Data Science is?
Agnieszka Słowik: Hi, the pleasure is all mine :) Data Science is a strongly interdisciplinary field and there is no exact definition. „Technology that learns from data to predict the future in order to drive better decisions” is one of the nicer explanations because it shows that data science is both technical and applicable to real life problems. Currently, it is also a buzzword and people use it for different technical areas from parts of front-end development (data visualization) to scientific computing, but generally it is about using data in a smart way.
Ewelina Wołoszyn: So where we can use it?
Agnieszka Słowik: There are many applications, including: finance, marketing, healthcare, social sciences, etc. Basically any field that takes advantage of data-driven insight. It’s useful for any customer-oriented business because it enables more personalised services. For example, an appliance company wants to increase sales of their new hi-tech perfect milkshake blender. Traditionally, the marketing department would send an advertising mail to all the customers and most of them wouldn’t even open it because they don’t need a new milkshake blender, they never open adds or the company is too expensive for them. Thanks to data science, the company is able to use this kind of information to make a cheaper and more effective ad campaign. Predictive analytics can be used directly as a product too. My team at Architech built a flight delays prediction model with user interface, so that travellers are able to check in advance whether a flight is going to be delayed.
Ewelina Wołoszyn: And is becoming more and more popular, isn’t it?
Agnieszka Słowik: Data is growing at a faster rate than ever before. In „The Unreasonable effectiveness of data”, a paper by Google’s AI experts Alon Halevy, Peter Norvig and Fernando Pereira, there is a statement: „simple models and a lot of data trump more elaborate models based on less data”. Indeed, data growth made data science easier – while creating challenges in other areas like scalability and cost management. These are the most exciting times for data hackers. Besides, machine learning and data science are going to play a big part in the future workforce automation.
Ewelina Wołoszyn: Many people would like to know how to get into this field. Can you tell us more how we can start an adventure with Data Science?
Agnieszka Słowik: My way to data science was pretty traditional – I’ve always liked maths, studied computer science, got some programming experience and then I decided I want to do something that involves research, coding and people skills. It doesn’t have to be this way though! There are so many MOOCs (Massive Open Online Courses) and free tutorials for beginners. Having said that, nothing can beat the practical experience. You can get it by participating in Kaggle competitions or using open source data to answer questions related to the fields you are interested in. For example, if you are interested in films you can use IMDB Movie Dataset from Kaggle to investigate 5000 reviews. The more fun the topic is the more likely you are to keep working on it in your free time. Another great way to start is to attend local meetups and workshops. Women in Technology organize one-day workshops in data science with Python – the next one is coming after Christmas break!
Ewelina Wołoszyn: How we can develop more in Data Science? Can you show some valuable resources, which we can use to learn about it?
Agnieszka Słowik: I read KDNuggets, Reddit threads and Quora questions related to machine learning, a couple of blogs and Cross Validated. If you study computer science at the moment (like I do), there is a chance some of your professors are interested in data science as well. I try to make my obligatory university projects involve as much of machine learning/data science as it’s possible. Usually you can choose Bachelor/Master thesis topic, so it’s another chance to sneak some data science experience. If you don’t study computer science nor other quantitative field you can still attend workshops like the ones organized by Women in Technology, do MOOCs and read books such as „Data Science from scratch” by Joel Grus.
Ewelina Wołoszyn: Agnieszka, can you tell more about the technologies? What we should choose to work in Data Science?
Agnieszka Słowik: You can start with any programming language. The most popular paths are Python and R. They both have extensive libraries that make data processing, machine learning and visualization much easier. Each of these languages has advantages – R is slightly richer in visualization tools while Python code is more readable and easier to debug. I’m in Python camp so far but I’m learning R as well. There is a huge big data technology stack – Apache software (Spark, Hadoop, Cassandra) to begin with. Some of these technologies are crucial for data science in production, but you can learn them later, after getting basic experience with Python or R.
May the new year bring you joy and laughter. We are all special and unique. May your Christmas be as special and unique as you are :) Best wishes from Agnieszka, Ewelina and Women in Technology team :)