Some Thoughts about Data Science
What do you think being a data scientist is about?
A data scientist uncovers insights that guide organization’s decision making and strategic planning by using mathematical skills, statisical knowledge as well as programming and advanced analytical techniques (machine learning and AI) to explore the organization’s data. The data a data scientist deals with is usually big data with large quantity, comes in with different types (pictures, videos etc) and gets updated in real time.
What do you see as the major duties and/or knowledge areas?
A data scientist’s major duties are:
- Identifying the data-analytics problems and determing the valuable data sources
- preprocessing large sets of structured and unstructured data from disparate sources
- Cleaning and validating the data to ensure accuracy, completeness, and uniformity
- Analyzing the data to identify trends and patterns
- Interpreting the data to discover solutions and opportunities by bulding predictive models and machine-learning algorithms
- Presenting information using data visualization techniques
- Collaborating with engineering and product development teams
Their knowledge areas are:
- Mathmetics and statistics
- Programming skills (Python, R, SQL etc)
- Advanced analytical experiences (machine learning, data mining, artifical intelligence)
- Knowledge about business intelligence tools (e.g. Tableau) and data frameworks (e.g. Hadoop)
What differences/similarities do you see between data scientists and statisticians?
Both data scientist and statistician have strong mathematical and statistical knowledge, they are experts of programming langurages such as R and Python. They investigate data by building models and present data with visualization skills.
A data scientist has more computer science skills while statistician focus more on mathematical concepts. Data scientists are in high demand in variety of industries while statisticians are needed more in academia, government, research institutions and pharmaceutical industries. Data scientists use more exploratory approach to analyze big data while statisticians analyze smaller amount of data with theratical, hypothesis-driven approach.
How do you view yourself in relation to these two areas?
I currently work as a statistical SAS programmer for a pharmaceutical consulting company while oversee the programming team for a clinical research study at Havard Medical School. My statistics and SAS programming skillset definitely help me with my daily work, I am finishing up my master of statistics degree in order to become a statistician. Nowadays, more and more clinical research and clinical trial studies choose electronic data capture to collect data rather than using paper, SQL is becoming an essential tool for clinical programming.