Research Areas

Main navigation.

The world is being transformed by data and data-driven analysis is rapidly becoming an integral part of science and society. Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data science and related methodologists with disciplines that are being transformed by data science and computation.

Our work supports research in a variety of fields where incredible advances are being made through the facilitation of meaningful collaborations between domain researchers, with deep expertise in societal and fundamental research challenges, and methods researchers that are developing next-generation computational tools and techniques, including:

Data Science for Wildland Fire Research

In recent years, wildfire has gone from an infrequent and distant news item to a centerstage isssue spanning many consecutive weeks for urban and suburban communities. Frequent wildfires are changing everyday lives for California in numerous ways -- from public safety power shutoffs to hazardous air quality -- that seemed inconceivable as recently as 2015. Moreover, elevated wildfire risk in the western United States (and similar climates globally) is here to stay into the foreseeable future. There is a plethora of problems that need solutions in the wildland fire arena; many of them are well suited to a data-driven approach.

Seminar Series

Data Science for Physics

Astrophysicists and particle physicists at Stanford and at the SLAC National Accelerator Laboratory are deeply engaged in studying the Universe at both the largest and smallest scales, with state-of-the-art instrumentation at telescopes and accelerator facilities

Data Science for Economics

Many of the most pressing questions in empirical economics concern causal questions, such as the impact, both short and long run, of educational choices on labor market outcomes, and of economic policies on distributions of outcomes. This makes them conceptually quite different from the predictive type of questions that many of the recently developed methods in machine learning are primarily designed for.

Data Science for Education

Educational data spans K-12 school and district records, digital archives of instructional materials and gradebooks, as well as student responses on course surveys. Data science of actual classroom interaction is also of increasing interest and reality.

Data Science for Human Health

It is clear that data science will be a driving force in transitioning the world’s healthcare systems from reactive “sick-based” care to proactive, preventive care.

Data Science for Humanity

Our modern era is characterized by massive amounts of data documenting the behaviors of individuals, groups, organizations, cultures, and indeed entire societies. This wealth of data on modern humanity is accompanied by massive digitization of historical data, both textual and numeric, in the form of historic newspapers, literary and linguistic corpora, economic data, censuses, and other government data, gathered and preserved over centuries, and newly digitized, acquired, and provisioned by libraries, scholars, and commercial entities.

Data Science for Linguistics

The impact of data science on linguistics has been profound. All areas of the field depend on having a rich picture of the true range of variation, within dialects, across dialects, and among different languages. The subfield of corpus linguistics is arguably as old as the field itself and, with the advent of computers, gave rise to many core techniques in data science.

Data Science for Nature and Sustainability

Many key sustainability issues translate into decision and optimization problems and could greatly benefit from data-driven decision making tools. In fact, the impact of modern information technology has been highly uneven, mainly benefiting large firms in profitable sectors, with little or no benefit in terms of the environment. Our vision is that data-driven methods can — and should — play a key role in increasing the efficiency and effectiveness of the way we manage and allocate our natural resources.

Ethics and Data Science

With the emergence of new techniques of machine learning, and the possibility of using algorithms to perform tasks previously done by human beings, as well as to generate new knowledge, we again face a set of new ethical questions.

The Science of Data Science

The practice of data analysis has changed enormously. Data science needs to find new inferential paradigms that allow data exploration prior to the formulation of hypotheses.

research work in data science

Indiana University Indiana University IU

Open Search

School of Informatics, Computing, and Engineering

Luddy School of Informatics, Computing, and Engineering

Data Science Program

A close-up of Raspberry Pi computer hardware

Conduct research that touches every corner of the world

When properly leveraged, data can serve as a powerful change agent. For communities. Entire industries. Even countries.

In the Data Science Program, we partner with organizations and interdisciplinary researchers from around the world to examine how data is currently being used—and, more importantly, how it can be used—to lead to effective, accurate, and transformative decision-making.

Data Science research in the news

IU expanding Luddy School of Informatics, Computing and Engineering name to Indianapolis

Cutting-edge study embeds students into robots for sci-fi-like educational opportunity

Salesforce, Luddy announce establishment of the Salesforce Faculty Fellows

Luddy data science students partner with IU faculty to advance research projects

Luddy Pre-College Summer Program allows students to explore tech in virtual realm

The Luddy Center for Artificial Intelligence

New $35 million Luddy Center for Artificial Intelligence dedicated

Kay Connelly

Connelly named Provost Professor

Luddy School mourns the passing of founding dean J. Michael Dunn

Fil Menczer (left) and Patrick Shih

Luddy faculty earn prestigious recognition from Association for Computing Machinery

Fil Menczer

Menczer named ACM Fellow

What our students and faculty are researching

From biomedicine and astronomy to environmental policy, business development, and everything in between, there isn’t an industry that our students and faculty aren’t examining through the lens of data science.

Our approach to research is dynamic. Visit one of our labs or lectures, and you’ll likely see us using a range of best-in-class data science tools, tactics, and methodologies.

These include:

Esmé Middaugh

It’s impressive to realize how quickly data visualization is evolving and expanding, and getting to see how the whole process works is extremely useful. Esmé Middaugh, M.S. in Data Science—Residential

Data science research centers

Data to insight center (d2i).

Focuses on drawing new insights from vast data sets by developing innovative tools and examining the full life cycle of digital data.

Digital Science Center

Advances cloud computing and network science, with a focus on developing human-centered interfaces to cyberinfrastructure.

Center for Bioinformatics Research

Carries out research in evolution and comparative genomics, precision medicine, predictive functional genomics, structural bioinformatics, and more.

Center for Complex Networks and Systems Research

Fosters interdisciplinary research in all areas related to complex networks and systems, computational social science, and data science.

Center for Machine Learning

Cyberinfrastructure for network science center.

Advances datasets, tools, and services for the study of biomedical, social and behavioral science, physics, and other networks.

Network Science Institute

Develops foundational network science theories, methods, and analytical tools to understand and improve the complex challenges of our world.

How to join a research project

To work with a faculty member on their research, or to complete an independent study with a particular faculty member, we encourage you to contact that faculty member directly.

If you have additional questions about research at data science, you can contact Haixu Tang , director of the data science graduate program.

Learn more about data science faculty

Data science program resources and social media channels.

Data Science

Search the Directory

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Data Science Researchers

research work in data science

Gregory Banyay

Assistant Research Professor University Park

Data science can significantly benefit multiple domains of engineering mechanics, particularly with respect to modeling and simulation. My focus lies primarily in the area of development, deployment, verification & validation, and uncertainty quantification, of digital twins, in support of both industrial and academic endeavors.

Methodologies: Bayesian Methods, Experimental Design, Decision Science, Dynamical Models, Machine Learning, Optimization, Predictive Modeling, Statistical Inference, Statistical Modeling, Time Series Analysis Applications: Environmental Sciences, Industrial Engineering, Music

Website: https://www.arl.psu.edu/content/fluid-dynamics-acoustics Email: [email protected]

research work in data science

Guido Cervone

Professor of Geography, Meteorology and Atmospheric Science University Park

My formal background is in Computational Science and Remote Sensing, and my research focuses on the development and application of computational algorithms for the analysis of spatio-temporal remote sensing; numerical modeling; and social media “Big Data” related to environmental hazards and renewable energy. I focus on problems related to the fusion of heterogenous data at different temporal and spatial scales.

Methodologies: Artificial Intelligence, Computational Tools for Data Science, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Spatio-Temporal Data Analysis Applications: Environmental Sciences

Website: Email: [email protected]

research work in data science

Enrique del Castillo

Distinguished Professor of Industrial Engineering and Professor of Statistics University Park

My broad interests are in Statistics and Machine Learning methods and their application to all of Engineering and to some areas in Science. The “big data” revolution has resulted not only in larger datasets but in data that have a more complex structure. The revolution has been driven by better and faster non-contact sensors in industry, by micro-arrays, better optics, and increasingly more powerful mass spectrometers in science, and by better remote sensing and optical equipment in geophysics and astronomy. In industry, while the traditional paradigm in statistics developed by Fisher, “Student” and Neyman, characterized by small samples obtained in expensive experiments, is very powerful and still of great application today, there is a considerable number of fields in both engineering and science where a response of interest is made of thousands of inexpensive observations, given the wide availability of different type of sensors and scanners.

My research over the years has focused on how to control or optimize an industrial process where large heterogeneous datasets are available. I am interested in building data-based statistical models for the control and optimization of engineering systems or that provide helpful information for scientists. This includes diverse problems in process control (Statistical and Time Series Control), Experimental Design, and Response Surface Optimization methods. In recent years I have worked in these areas dealing with complex, large geometrical (or geometrical-spatial) datasets, specifically, functional, shape and surface data (i.e., data that occurs in 1D or 2D-manifolds), image data (2 and 3D) and general high dimensional data that may be concentrated in lower dimensional manifolds.

Methodologies: Bayesian Methods, Casual Inference, Experimental Design, Dynamical Models, Machine Learning, Spatio-Temporal Data Analysis, Time Series Analysis Applications: Astronomy and Cosmology, Biological Sciences, Industrial Engineering, Manufacturing, Production

Website: https://sites.google.com/view/ecastillo/home#h.n74gqnov8h53 Email: [email protected]

research work in data science

Eric B. Ford

Professor of Astronomy & Astrophysics University Park

Ford’s research integrates planet formation theory and astronomical observations to improve our understanding of planet formation & evolution, both in our Solar System and in general. They develop, adapt and apply Bayesian methods to: (1) improve the detection and characterization of exoplanets, (2) characterize exoplanet populations, and (3) improve the design and efficiency of exoplanet surveys. For example, the Ford group is characterizing the population of planetary architectures based on data from NASA’s Kepler mission by combining Hierarchical Bayesian Modeling, Approximate Bayesian Computing, and Gaussian Process emulators. As another example, the Ford group is researching how radial velocity surveys can distinguish planets from intrinsic stellar variability by applying machine learning to time series of high-resolution stellar spectra. Ford created a graduate class on High-Performance Scientific Computing for Astrophysics (Astro 528), contributes to advanced summer schools run by the Penn State Center for Astrostatistics, and maintains a mailing list for Julia Language Users at Penn State. Ford is an Institute for CyberScience co-hire, a co-PI for the CyberLAMP cluster, and has served on Penn State’s Data Sciences Major Management Committee.

Methodologies: Bayesian Methods, Computational Tools for Data Science, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Statistical Modeling, Time Series Analysis Applications: Astronomy and Cosmology

Website: http://personal.psu.edu/~ebf11/ Email: [email protected]

research work in data science

David Reese Professor of Information Sciences and Technology University Park

My research involves the creation and development of various novel search engines and digital libraries that utilize machine learning and information retrieval techniques.

Methodologies: Deep Learning, Information Retrieval, Machine Learning, Natural Language Processing Applications: Computer Vision, Education

Website: https://clgiles.ist.psu.edu/ Email: [email protected]

research work in data science

Terry P. Harrison

Professor of Supply Chain and Information Systems University Park

I use optimization to look at large scale production-distribution systems. I also have a focus on the use of optimization to explore the tradeoffs between additive and subtractive manufacturing. Lastly, I am examining the use of blockchain as a method to create more robust and efficient supply chains.

Methodologies: Algorithms, Decision Science, Network Analysis, Optimization Applications: Business Analytics, Environmental Sciences, Supply chain management

Website: Email: [email protected]

research work in data science

Pete Hatemi

Distinguished Professor University Park

Pete Hatemi is Distinguished Professor of Political Science, Co-fund Microbiology and Biochemistry at Penn State University. He conducts research in the fields of individual differences in preferences, decision-making, and social behaviors on a wide range of topics, including: political behaviors and attitudes, addiction, political violence and terrorism, public health, gender identification, religion, mate selection, and the nature of interpersonal relationships. In so doing he advocates theoretical and methodological pluralism, including but not limited to behavioral experiments, endocrinology, genetics, physiology, neuroscience, and social learning approaches. He works on policy, health care and national defense in the government, private and public sectors.

Methodologies: Experimental Design, Sparse Data Analysis, Biostatistics, quantitative genetics Applications: Behavioral Science, Biological Sciences, Health Sciences, Political Science, Psychology, Social Sciences

Website: https://scholar.google.com/citations?user=Ci8Ix08AAAAJ&hl=en Email: [email protected]

research work in data science

Louisa Holmes

Assistant Professor of Geography and Demography University Park

I am a health geographer and demographer with additional training in public health and public policy. My research focuses in three areas – (1) health disparities and the socio-spatial determinants of health; (2) tobacco control and substance use; and (3) quantitative and geospatial research methods, particularly representative survey research and area-level observational studies. In my interdisciplinary work, I seek to understand contexts of health and place as foundational to perpetuating health disparities, as well as opportune for promoting health, through social engagement, built and natural environments, and multi-level policy infrastructures. In recent years, I have increasingly approached my research through the lens of sustainability; sustainable communities are those with equitable access to environments optimal for promoting health and preventing disease.

I have designed and implemented numerous probabilistic household surveys and environmental data collection projects, with which data I have published on topics such as tobacco control, cannabis use, migrant health and biological risk profiles in the context of urban neighborhoods. Presently, I am completing the second wave of a panel study of young adult substance use in the San Francisco Bay Area, which also includes tobacco, vape and cannabis retail data collection, and neighborhood assessments.

At Penn State, I also teach Intro to Spatial Methods and Advanced Spatial Methods, along with special topics courses in Health Geography.

Methodologies: Data Visualization, Spatio-Temporal Data Analysis, Statistical Modeling Applications: Geographic Information Systems, Health Sciences, Social Sciences

Website: https://www.geog.psu.edu/directory/louisa-m-holmes Email: [email protected]

research work in data science

Vasant Honavar

Professor and Edward Frymoyer Chair of Information Sciences and Technology, Director, Artificial Intelligence Research Laboratory University Park

My most recent work in Data Sciences has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked open data (ii) Algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; (iii) New approaches to selective sharing of knowledge across autonomous knowledge bases (including knowledge base federation, secrecy-preserving query answering); (iv) Theoretically sound yet practically useful approaches to functional and non-functional specification driven composition of complex services from components; (v) Expressive languages for representing, and model checking approaches to reasoning with, qualitative preferences; (vi) Algorithms for eliciting causal effects from disparate sources of observational and experimental data; (vii) Scalable algorithms and software for comparative analyses of large bio-molecular networks and (6) Machine learning approaches to analysis and prediction of macromolecular interactions and interfaces (including in particular, the first algorithm for partner-specific prediction of protein-protein interface sites and state-of-the-art sequence based protein-RNA interface predictors) that have resulted in several widely used web servers for analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes.

My current research focuses on (1) Computational abstractions scientific artifacts (e.g., data, knowledge, hypotheses), and universes of scientific discourse (e.g., biology), and scientific processes (e.g., hypothesis generation, predictive modeling, experimentation, simulation, and hypothesis testing), cognitive tools that augment and extend human intellect; and human-machine infrastructure (including data and computational infrastructure and organizational structures and processes) to accelerate science; (2) Design and analysis of algorithms for predictive modeling from very large, high dimensional, richly structured, multi-modal, longitudinal data; (3) Elucidation of causal relationships from disparate experimental and observational studies; (4) Elucidation of causal relationships from relational, temporal, and temporal-relational data; (5) Design and analyses of accountable, explainable, and fair AI systems; (5) Analysis and prediction of macromolecular interactions, elucidation of complex biological pathways e.g., those involved in immune response, development, and disease; (6) Predictive and causal modeling of individual and population health outcomes from behavioral, biomedical, clinical, environmental, socio-demographic data; (7) Predictive and causal modeling of behavioral and cognitive systems in naturalistic settings; (8) Accelerating materials discovery using machine learning (8) Modeling the structure, activity, and function of brain networks from fMRI and other types of data.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Machine Learning, Network Analysis, Spatio-Temporal Data Analysis Applications: Bioinformatics, Computer Science, Cyber Security, Health Sciences, Industrial Engineering, Materials Science, Networks, Neuroscience

Website: http://ailab.ist.psu.edu Email: [email protected]

research work in data science

Sharon Xiaolei Huang

Associate Professor, College of Information Sciences and Technology University Park

Dr. Sharon Huang is a data scientist who works with multimedia data, especially image and video data in the biomedical domain. She focuses on image analysis, machine learning and visual analytics methods for object recognition, image and video segmentation, image and video synthesis, computer-assisted diagnosis and intervention, image registration/matching, and motion tracking. Her broader interests include data science for healthcare, artificial intelligence for medicine, biomedical informatics, computer vision, computer graphics, and human-computer interaction.

Methodologies: Algorithms, Artificial Intelligence, Bayesian Methods, Computational Tools for Data Science, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Machine Learning, Predictive Modeling, Spatio-Temporal Data Analysis Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Health Sciences, Materials Science

Website: sharon-huang.ist.psu.edu Email: [email protected]

research work in data science

David Hunter

Professor of Statistics University Park

My work in statistical optimization algorithms includes coining and and helping to popularize the term “MM algorithms,” which is a class of algorithms that contains the well-known EM algorithms. I also work on statistical models for networks and am a co-creator of the “statnet” suite of packages for network analysis in R. Finally, I work on the theory and computational practice of unsupervised clustering using nonparametric finite mixture models.

Methodologies: Algorithms, Network Analysis Applications: Networks

Website: http://personal.psu.edu/drh20/ Email: [email protected]

research work in data science

Jia Li’s research interests include statistical/machine learning, probabilistic graph models, image analysis with applications in a variety of disciplines. She has developed fundamental methods and algorithms for machine learning as well as real-time AI systems for image annotation, classification, and composition analysis.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Data Mining, Data Visualization, Deep Learning, High-Dimensional Data Analysis, Image Data Processing and Analysis, Information Retrieval, Machine Learning, Real-time Data Processing, Spatio-Temporal Data Analysis, Statistical Modeling, Time Series Analysis Applications: Bioinformatics, Biological Sciences, Climate Research, Computer Science, Computer Vision, Digital Humanities, Electrical Engineering, Materials Science, Psychology

Website: stat.psu.edu/~jiali Email: [email protected]

research work in data science

Shaun Mahony

Assistant Professor of Biochemistry & Molecular Biology University Park

My lab develops machine learning applications for understanding gene regulation. We are particularly interested in regulatory proteins called transcription factors, which recognize particular DNA binding sites in the genome and thereby regulate the cell-specific activities of genes. We develop machine learning approaches to understand the DNA sequence and chromatin patterns that determine transcription factor regulatory events within a given cell type.

Methodologies: Deep Learning, Machine Learning Applications: Bioinformatics, Biological Sciences

Website: http://mahonylab.org/ Email: [email protected]

research work in data science

Paul Medvedev

Associate Professor University Park

Paul Medvedev’s research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods.

Methodologies: Algorithms, Artificial Intelligence, Computational Tools for Data Science, Machine Learning Applications: Bioinformatics, Biological Sciences, Computer Science

Website: http://medvedevgroup.com Email: [email protected]

research work in data science

Kevin Munger

Assistant Professor of Political Science and Social Data Analytics University Park

I do large-scale quantitative analysis of social media trace data, with a focus on video-focused platforms like YouTube and TikTok. I also conduct randomized “field” experiments on social media using Twitter bots. My primary methods are webscraping and quantitative text analysis.

Methodologies: Casual Inference, Computational Tools for Data Science, Experimental Design, Machine Learning Applications: Political Science

Website: https://polisci.la.psu.edu/people/kmm7999 Email: [email protected]

research work in data science

Rebecca Napolitano

Assistant Professor of Architectural Engineering University Park

​My research group focuses on hybrid analytics which lies at the intersection of architectural engineering, data science, and historic preservation. Hybrid analytics, a nascent field, is the combination of physics-based modeling and data-driven modeling for the end goal of making real-time predictions and monitoring in the context of Digital Twin a reality. This new field leverages the decipherability and clear-box nature of physics-based modeling, with accuracy and pattern recognition techniques of data-driven machine learning algorithms. More specifically, our research at the intersection with data science focuses on the following aspects for preservation and adaptive reuse of existing and historic structures as a sustainable infrastructure solution: 1) eye tracking and knowledge graphs to analyze bias during a visual inspection, 2) pattern recognition for damage detection and model generation, 3) sensor modality and location optimization, 4) feature learning from monitoring data, 5) predictive modeling of infrastructure using physics-based models, 6) adaptive design of experiments for new construction/repair materials.

Methodologies: Artificial Intelligence, Bayesian Methods, Experimental Design, Data Mining, High-Dimensional Data Analysis, Machine Learning, Predictive Modeling, Real-time Data Processing Applications: Civic Infrastructure, Materials Science

Website: https://sites.psu.edu/thebeamlab/research/ Email: [email protected]

research work in data science

Becky Passonneau

Professor of Computer Science and Engineering University Park

My area of research is natural language processing (NLP), with a focus on semantics and pragmatics. I investigate how the same combinations of words have different meanings in different contexts, in spoken or written language. Recently I have been working on NLP applied to educational technology to support reading and writing skills, and on novel adaptive dialogue policies for agents that learn from people through text-based multi-modal dialogue. In the past I have worked on a wide range of topics including summarization of textual and quantitative data, exploration of knowledge graphs, causal models of failures on the electrical grid based on mining structured and unstructured (textual) data, text forecasting from financial news.

Methodologies: Artificial Intelligence, Casual Inference, Data Mining, Deep Learning, Natural Language Processing Applications: Computational Linguistics

Website: https://www.nlplab.psu.edu/ Email: [email protected]

research work in data science

Wesley Reinhart

Assistant Professor of Materials Science & Engineering University Park

My research platform takes advantage of lessons learned from the traditional materials design approach to develop efficient and robust inverse design workflows based on both physics-based modeling and data-driven paradigms, including GPU accelerated computing, hybrid simulation methods accelerated by machine learning, and generative models which require no simulation at all. We seek to capitalize on advances in both data science and machine learning, including the increasingly popular deep learning but also methods based on Gaussian Process, Optimal Transport, and other related methods, as well as high-performance physics simulation to predict the thermodynamic, electromagnetic, and mechanical responses of materials. The increasingly close coupling of these topics with materials synthesis and characterization will undoubtedly unlock new and improved functionalities in a wide variety of materials applications.

Methodologies: Artificial Intelligence, Data Mining, Deep Learning, High-Dimensional Data Analysis, Machine Learning, Optimization, Predictive Modeling Applications: Chemistry and Chemical Engineering, Materials Science, Nanotechnology

Website: https://sites.psu.edu/reinhartgroup/ Email: [email protected]

research work in data science

Shomir Wilson

Assistant Professor of Information Sciences and Technology University Park

My research brings together natural language processing (NLP), privacy, and artificial intelligence. I direct the Human Language Technologies Lab at Penn State.

I am interested in solving problems to enable computers to do meaningful work with large volumes of natural language text. My lab develops new methods for NLP and applies them to a variety of domains, including privacy, online social networks, web science, and digital libraries. I am particularly interested in breaking down technology’s “walls of text”, i.e., situations where a human user or decision-maker is expected to consume a large quantity of text to take action while lacking sufficient resources (time, expertise) to properly understand what they have been given. I have applied this paradigm to privacy policies, scholarly manuscripts, documents from the world wide web, and historical texts, and I am always interested in new domains to work with.

Methodologies: Artificial Intelligence, Data Security and Privacy, Decision Science, Deep Learning, Machine Learning, Natural Language Processing Applications: Behavioral Science, Business Analytics, Computational Linguistics, Computer Science, Cyber Security, Digital Humanities

Website: https://shomir.net Email: [email protected]

research work in data science

Lingzhou Xue

Associate Professor of Statistics University Park

My research focuses on the development and application of advanced statistical methods, theory, and computational algorithms for analyzing complex, high-dimensional data, with a special emphasis on the variable selection, network analysis, high-dimensional hypothesis testing, and nonconvex statistical learning.

Methodologies: Deep Learning, High-Dimensional Data Analysis, Machine Learning, Network Analysis, Optimization, Statistical Inference, Statistical Modeling Applications: Bioinformatics, Biological Sciences, Business Analytics, Environmental Sciences, Finance Research, Networks

Website: https://stat.psu.edu/people/lingzhou-xue Email: [email protected]

research work in data science

Christopher Zorn

Liberal Arts Professor of Political Science University Park

Christopher Zorn is the Liberal Arts Professor of Political Science, Professor of Sociology and Criminology (by courtesy), and Affiliate Professor of Law at Pennsylvania State University. He holds a Ph.D. in political science from Ohio State University (1997) and a B.A. in political science and philosophy from Truman State University (1991). Prior to coming to Penn State, he was Professor of Political Science at the University of South Carolina (2005-2007), a Visiting Scientist and Program Director for the Law and Social Science Program at the National Science Foundation (2003-2005), and Winship Distinguished Research Professor of Political Science at Emory University, where he taught from 1996 to 2003. His research focuses on judicial politics and on statistics for the social and behavioral sciences. Professor Zorn is the recipient of eight grants from the NSF, as well as numerous other fellowships and awards. His current research interests include unsupervised learning methods for text, measurement models and data reduction, and data visualization for group decision making.

Methodologies: Data Mining, Data Visualization, Decision Science, Natural Language Processing, Spatio-Temporal Data Analysis, Statistical Inference, Statistical Modeling Applications: Behavioral Science, Business Analytics, Law, Political Science, Social Sciences

Website: http://goo.gl/20mBf/ Email: [email protected]

DSI facilitates the kind of breakthrough discoveries that would have been unthinkable just a few years ago.

We make it possible for Columbia’s scholars and students to extract value from the vast reservoirs of data that are being generated today. DSI-affiliated researchers work in a wide range of disciplines – from business to medicine, social work to literature, history to natural science – and collaborate in interdisciplinary teams to gather and interpret data and address urgent problems facing our society.

Focus Areas

We have accelerated the pace of discovery by working on five of society’s most challenging problems.

Cybersecurity

Data, media and society, business and finance, financial and business analytics, smart cities, computing systems for data-driven science, sense, collect and move data, health care, health analytics, foundations of data science.

Research Centers

Our centers are engines of translational research and education in the data sciences, and a source of technology with high commercialization potential.

We develop, monitor, and improve infrastructure, buildings, transportation routes, the power supply, and everyday activities in crowded, urban environments. 

research work in data science

We study the physical aspects of sensing, generating, collecting, storing, transporting, and processing large data sets. 

research work in data science

We work to improve the health of individuals and the health care system through data-driven methods and understanding of health processes. 

research work in data science

We conduct core research on problems that cut across the data sciences and engineering.

research work in data science

We develop analytical and computational tools to manage risk and to support decisions using the growing volume and variety of data available. 


research work in data science

We use data generated by people and data about people to understand human behavior.

research work in data science

We develop the capacity to keep data secure and private throughout its lifetime. 

research work in data science

We explore the design, analysis, and application of massive-scale computing systems for processing data.

research work in data science

Explore More

Working groups.

We address challenges posed by our data-rich society through clusters of multidisciplinary researchers.

research work in data science

Faculty Recruitment Program

We support faculty hires at all levels in any field with an interest in data science.

research work in data science

Postdoctoral Fellows

We seek recent Ph.D. graduates with explicit interests in advancing and/or applying data science to other domains.

research work in data science

Northeast Big Data Innovation Hub

We build and strengthen partnerships across industry, academia, nonprofits, and government.

research work in data science

Funding Opportunities

We support research collaborations between data scientists and domain experts.

research work in data science

37 Research Topics In Data Science To Stay On Top Of

As a data scientist, staying on top of the latest research in your field is essential.

Please enable JavaScript

The data science landscape changes rapidly, and new techniques and tools are constantly being developed.

To keep up with the competition, you need to be aware of the latest trends and topics in data science research.

In this article, we will provide an overview of 37 hot research topics in data science.

These topics could be an idea for a thesis or simply topics you can research independently.

Stay tuned – this is one blog post you don’t want to miss!

37 Research Topics in Data Science if(typeof ez_ad_units!='undefined'){ez_ad_units.push([[970,90],'enjoymachinelearning_com-medrectangle-4','ezslot_9',121,'0','0'])};__ez_fad_position('div-gpt-ad-enjoymachinelearning_com-medrectangle-4-0');

1.) predictive modeling if(typeof ez_ad_units='undefined'){ez_ad_units.push([[250,250],'enjoymachinelearning_com-small-rectangle-2','ezslot_34',123,'0','0'])};__ez_fad_position('div-gpt-ad-enjoymachinelearning_com-small-rectangle-2-0');.

Simply put, it is the process of using historical data to build models that can predict future outcomes.

Predictive modeling has many applications, from marketing and sales to financial forecasting and risk management.

As businesses increasingly rely on data to make decisions, predictive modeling is becoming more and more important.

While it can be complex, predictive modeling is a powerful tool that gives businesses a competitive advantage.

2.) Big Data Analytics

And with good reason – organizations of all sizes are sitting on mountains of data, and they’re increasingly turning to data scientists to help them make sense of it all.

But what exactly is big data? And what does it mean for data science?

Big data typically refers to datasets of a few terabytes or more.

Given the enormity of big data, it’s not surprising that organizations are struggling to make sense of it all.

That’s where data science comes in.

By harnessing the power of big data analytics, they can improve their decision-making, better understand their customers, and develop new products and services.

3.) Auto Machine Learning

Auto machine learning is a research topic in data science concerned with developing algorithms that can automatically learn from data without intervention.

This area of research is vital because it allows data scientists to automate the process of writing code for every dataset.

This allows us to focus on other tasks, such as model selection and validation.

This makes them a valuable tool for data scientists who either don’t have the skills to do their own analysis or are struggling.

4.) Text Mining

Text mining techniques can extract information from text data, such as keywords, sentiments, and relationships.

5.) Natural Language Processing

Natural language processing is a data science research topic that analyzes human language data.

This area of research is important because it allows us to understand and make sense of the vast amount of text data available today.

6.) Recommender Systems

Businesses can better understand their customers and their needs by using recommender systems.

This, in turn, allows them to develop better products and services that meet the needs of their customers.

Think about Netflix, for example, always knowing what you want to watch!

7.) Deep Learning

Deep learning is a research topic in data science that deals with artificial neural networks.

These networks are composed of multiple layers, and each layer is formed from various nodes.

The deep learning network has become very popular in recent years because of its ability to achieve state-of-the-art results on various tasks.

There seems to be a new SOTA deep learning algorithm research paper on  https://arxiv.org/  every single day!

8.) Reinforcement Learning

Reinforcement learning is a research topic in data science that deals with algorithms that can learn on multiple levels from interactions with their environment.

9.) Data Visualization

Data visualization techniques can be used to create charts, graphs, and other visual representations of data.

This allows us to share our findings with others in a way that is easy to understand.

10.) Predictive Maintenance

This is done using data analytics to predict when a failure will occur.

This allows us to take corrective action before the failure actually happens.

11.) Financial Analysis

Current researchers are focused on analyzing macroeconomic data to make better financial decisions.

This is done by analyzing the data to identify trends and patterns.

Financial analysis is also used to predict future economic trends.

12.) Image Recognition

This is done using artificial intelligence algorithms that can learn from data and understand what objects you’re looking for.

Think about security, identification, routing, traffic, etc.

13.) Fraud Detection

This is done by analyzing data to look for patterns and trends that may be associated with the fraud.

Fraud detection is a valuable tool for anyone who wants to protect themselves from potential fraudulent activity.

14.) Web Scraping

Web scraping is a controversial topic in data science because it allows us to collect data from the web, which is usually data you do not own.

This is done by extracting data from websites using scraping tools that are usually custom-programmed.

For obvious reasons, web scraping is a unique tool – giving you data your competitors would have no chance of getting.

15.) Social Media Analysis

However, it is still a great data science research topic because it allows us to understand how people interact on social media.

For example, if we know that a particular demographic prefers a specific type of content, we can create more content that appeals to them.

This allows businesses to understand better what their customers want and need.

Overall, social media analysis is valuable for anyone who wants to improve their marketing efforts or understand how customers interact with brands.

16.) GPU Computing

Due to how GPUs are made, they’re incredibly proficient at intense matrix operations, outperforming traditional CPUs by very high margins.

17.) Quantum Computing

Quantum computing is a new research topic in data science and physics because it allows us to process data much faster than traditional computers.

For example, if you wanted to understand how a single atom moved around, a classical computer couldn’t handle this problem.

You could be too.

18.) Genomics

Genomics may be the only research topic that can compete with quantum computing regarding the “number of top researchers working on it.”

Genomics is a fantastic intersection of data science because it allows us to understand how genes work.

Genomics is also used to study the evolution of different species.

Genomics is the future and a field begging for new and exciting research professionals to take it to the next step.

19.) Location-based services

Since GPS and 4g cell phone reception became a thing, we’ve been trying to stay informed about how humans interact with their environment.

Location-based services are used to understand the user, something every business could always use a little bit more of.

20.) Smart City Applications

Smart city applications are all the rage in data science research right now.

In short, they are systems that use data to improve city infrastructure and services.

This can include anything from traffic management and energy use to waste management and public safety.

It is then analyzed to identify tendencies and habits.

21.) Internet Of Things (IoT)

The Internet of Things, or IoT, is exciting and new data science and sustainability research topic.

IoT is a network of physical objects embedded with sensors and connected to the internet.

That means that they can share data with computers.

They’re also using IoT data to predict when an appliance will break down or when a road will be congested.

22.) Cybersecurity

Cybersecurity is a relatively new research topic in data science and in general, but it’s already garnering a lot of attention from businesses and organizations.

While most of cybersecurity focuses on infrastructure, data scientists can leverage historical events to find potential exploits to protect their companies.

As a result, cybersecurity is a crucial data science research area and one that will only become more important in the years to come.

23.) Blockchain

Did someone say transmitting data?

Finally, blockchain is still in its early stages of development, so there is much room for research and innovation.

24.) Sustainability

Datasets on sustainability are constantly growing and changing, making it an exciting challenge for data scientists.

25.) Educational Data

Besides, data science can be used to develop educational interventions tailored to individual students’ needs.

26.) Politics

By analyzing large data sets, political scientists (data scientists with a cooler name) can gain valuable insights into voting patterns, campaign strategies, and more.

27.) Cloud Technologies

This lets organizations save money on hardware and maintenance costs while providing employees access to the latest and greatest software and applications.

By researching cloud technologies, data scientists can help organizations to make the most of this new and exciting technology.

28.) Robotics

Robotics has recently become a household name, and it’s for a good reason.

29.) HealthCare

Hospitals, clinics, and health insurance companies generate a tremendous amount of data daily.

30.) Remote Work

In today’s global economy, more and more businesses are allowing their employees to work from home or anywhere else they can get a stable internet connection.

31.) Data-Driven Journalism

Data-driven journalism is an exciting new field of research that combines the best of both worlds: the rigor of data science with the creativity of journalism.

It is an exciting new topic and research field for data scientists to explore.

32.) Data Engineering

Data engineering is a staple in data science, focusing on efficiently managing data.

If you are looking for a challenging research topic that would immediately impact you worldwide, then improving or innovating a new approach in data engineering would be a good start.

33.) Data Curation

Data curation is a vital part of data science. In recent years, there has been an increasing focus on data curation, as it has become clear that it is essential for ensuring data quality.

34.) Meta-Learning

So, if you can learn how to learn, you can learn anything much faster.

For example, if you have a bunch of different models that all solve the same problem, then you can use meta-learning to share the knowledge between them to improve the cluster (groups) overall performance.

35.) Data Warehousing

This data type can be used to create reports and perform statistical analysis.

They also help to improve the accuracy of reports and provide a complete picture of the organization’s performance.

36.) Business Intelligence

Business intelligence aims to collect, process, and analyze data to help businesses make better decisions.

Data science is the perfect tool for business intelligence because it combines statistics, computer science, and machine learning.

37.) Crowdsourcing

This can be done for various purposes, such as gathering data, developing new algorithms, or even just for fun (think: online quizzes and surveys).

Final Thoughts, Are These Research Topics In Data Science For You? if(typeof ez_ad_units!='undefined'){ez_ad_units.push([[970,90],'enjoymachinelearning_com-leader-3','ezslot_14',129,'0','0'])};__ez_fad_position('div-gpt-ad-enjoymachinelearning_com-leader-3-0');

If not, don’t worry – there are plenty of other great topics to explore.

Other Data Science Articles

research work in data science

Towards Data Science

Dr. Sunil Kumar Vuppala

Jun 27, 2020

Member-only

Top 20 Latest Research Problems in Big Data and Data Science

Problem statements in 5 categories, research methodology and research labs to follow.

E ven though Big data is in the mainstream of operations as of 2020, there are still potential issues or challenges the researchers can address. Some of these issues overlap with the data science field. In this article, the top 20 interesting latest research problems in the combination of big data and data science are covered based on my personal experience (with due respect to the Intellectual Property of my organizations) and the latest trends in these domains [1,2]. These problems are covered under 5 different categories, namely

Core Big data area to handle the scale Handling Noise and Uncertainty in the data Security and Privacy aspects Data Engineering Intersection of Big data and Data science The article also covers a research methodology to solve specified problems and top research labs to follow which are working in these areas.

I encourage researchers to solve applied research problems which will have more impact on society at large. The reason to stress this point is that we are hardly analyzing 1% of the available data. On the other hand, we are generating terabytes of data every day. These problems are not very specific to a domain and can be applied across the domains.

Let me first introduce 8 V’s of Big data (based on an interesting article from Elena ), namely Volume, Value, Veracity, Visualization, Variety, Velocity, Viscosity, and Virality. If we closely look at the questions on individual V’s in Fig 1, they trigger interesting points for the researchers. Even though they are business questions, there are underlying research problems. For instance, 02-Value: “Can you find it when you most need it?” qualifies for analyzing the available data and giving context-sensitive answers when needed.

Having understood the 8V’s of big data, let us look into details of research problems to be addressed. General big data research topics [3] are in the lines of:

Next, let me cover some of the specific research problems across the five listed categories mentioned above. The problems related to core big data area of handling the scale:-

Hadoop or Spark kind of environment is used for offline or online processing of data. The industry is looking for scalable architectures to carry out parallel data processing of big data. There is a lot of progress in recent years, however, there is a huge potential to improve performance.

2. Handling real-time video analytics in a distributed cloud:

With the increased accessibility to the internet even in developing countries, videos became a common medium of data exchange. There is a role of telecom infrastructure, operators, deployment of the Internet of Things (IoT), and CCTVs in this regard. Can the existing systems be enhanced with low latency and more accuracy? Once the real-time video data is available, the question is how the data can be transferred to the cloud, how it can be processed efficiently both at the edge and in a distributed cloud?

3. Efficient graph processing at scale:

Social media analytics is one such area that demands efficient graph processing. The role of graph databases in big data analytics is covered extensively in the reference article [4]. Handling efficient graph processing at a large scale is still a fascinating problem to work on.

The research problems to handle noise and uncertainty in the data:-

4. Identify fake news in near real-time:

This is a very pressing issue to handle the fake news in real-time and at scale as the fake news spread like a virus in a bursty way. The data may come from Twitter or fake URLs or WhatsApp. Sometimes it may look like an authenticated source but still may be fake which makes the problem more interesting to solve.

5. Dimensional Reduction approaches for large scale data:

One can extend the existing approaches of dimensionality reduction to handle large scale data or propose new approaches. This also includes visualization aspects. One can use existing open-source contributions to start with and contribute back to the open-source.

6. Training / Inference in noisy environments and incomplete data :

Sometimes, one may not get a complete distribution of the input data or data may be lost due to a noisy environment. Can the data be augmented in a meaningful way by oversampling, Synthetic Minority Oversampling Technique (SMOTE), or using Generative Adversarial Networks (GANs)? Can the augmentation help in improving the performance? How one can train and infer is the challenge to be addressed.

7. Handling uncertainty in big data processing:

There are multiple ways to handle the uncertainty in big data processing[4]. This includes sub-topics such as how to learn from low veracity, incomplete/imprecise training data. How to handle uncertainty with unlabeled data when the volume is high? We can try to use active learning, distributed learning, deep learning, and fuzzy logic theory to solve these sets of problems.

The research problems in the security and privacy [5] area:-

8. Anomaly Detection in Very Large Scale Systems:

The anomaly detection is a very standard problem but it is not a trivial problem at a large scale in real-time. The range of application domains includes health care, telecom, and financial domains.

9. Effective anonymization of sensitive fields in the large scale systems :

Let me take an example from Healthcare systems. If we have a chest X-ray image, it may contain PHR (Personal Health Record). How one can anonymize the sensitive fields to preserve the privacy in a large scale system in near real-time? This can be applied to other fields as well primarily to preserve privacy.

10. Secure federated learning with real-world applications:

Federated learning enables model training on decentralized data. It can be adopted where the data cannot be shared due to regulatory / privacy issues but still may need to build the models locally and then share the models across the boundaries. Can we still make the federated learning work at scale and make it secure with standard software/hardware-level security is the next challenge to be addressed. Interested researchers can explore further information from RISELab of UCB in this regard.

11. Scalable privacy preservation on big data:

Privacy preservation for large scale data is a challenging research problem to work on as the range of applications varies from the text, image to videos. The difference in country/region level privacy regulations will make the problem more challenging to handle.

The research problems related to data engineering aspects:-

12. Lightweight Big Data analytics as a Service:

Everything offering as a service is a new trend in the industry such as Software as a Service (SaaS). Can we work towards providing lightweight big data analytics as a service?

13. Auto conversion of algorithms to MapReduce problems:

MapReduce is a well-known programming model in Big data. It is not just a map and reduce functions but provide scalability and fault-tolerance to the applications. However, there are not many algorithms that support map-reduce directly. Can we build a library to do an auto conversion of standard algorithms to support MapReduce?

14. Automated Deployment of Spark Clusters:

A lot of progress is witnessed in the usage of spark clusters in recent times but they are not completely ready for automated deployment. This is yet another challenging problem to explore further.

The research problems in intersection of big data with data science:-

15. Approaches to make the models learn with less number of data samples:

In the last 10 years, the complexity of deep learning models increased with the availability of more data and compute power. Some researchers proudly claim that they solved a complex problem with hundreds of layers in deep learning. For instance, image segmentation may need a 100 layer network to solve the segmentation problem. However, the recent trend is that can anyone solve the same problem with less relevant data and with less complexity? The reason behind this thinking is to run the models at the edge devices, not just only at the cloud environment using GPUs/TPUs. For instance, the deep learning models trained on big data might need deployment in CCTV / Drones for real-time usage. This is fundamentally changing the approach of solving complex problems. You may work on challenging problems in this sub-topic.

16. Neural Machine Translation to Local languages:

One can use Google translation for neural machine translation (NMT) activities. However, there is a lot of research in local universities to do neural machine translation in local languages with support from the Governments. The latest advances in Bidirectional Encoder Representations from Transformers (BERT) are changing the way of solving these problems. One can collaborate with those efforts to solve real-world problems.

17. Handling Data and Model drift for real-world applications:

Do we need to run the model on inference data if one knows that the data pattern is changing and the performance of the model will drop? Can we identify the drift in the data distribution even before passing the data to the model? If one can identify the drift, why should one pass the data for inference of models and waste the compute power. This is a compelling research problem to solve at scale in the real world. Active learning and online learning are some of the approaches to solve the model drift problem.

18. Handling interpretability of deep learning models in real-time applications:

Explainable AI is the recent buzz word. Interpretability is a subset of explainability. Machine / Deep learning models are no more black-box models. Few models such as Decision Trees are interpretable. However, if the complexity increases, the base model itself may not be useful to interpret the results. We may need to depend on surrogate models such as Local interpretable model-agnostic explanations (LIME) / SHapley Additive exPlanations (SHAP) to interpret. This can help the decision-makers with the justification of the results produced. For instance, rejection of a loan application or classifying the chest x-ray as COVID-19 positive. Can the interpretable models handle large scale real-time applications?

19. Building context-sensitive large scale systems:

Building a large scale context-sensitive system is the latest trend. There are some open-source efforts to kick start. However, it requires a lot of effort in collecting the right set of data and building context-sensitive systems to improve search capability. One can choose a research problem in this topic if you have a background on search, knowledge graphs, and Natural Language Processing (NLP). This is applicable across the domains.

20. Building large scale generative based conversational systems (Chatbot frameworks):

One specific area gaining momentum is building conversational systems such as Q&A and Chatbot generative systems. A lot of chatbot frameworks are available. Making them generative and preparing summary in real-time conversations are still challenging problems. The complexity of the problem increases as the scale increases. A lot of research is going on in this area. This requires a good understanding of Natural Language Processing and the latest advances such as Bidirectional Encoder Representations from Transformers (BERT) to expand the scope of what conversational systems can solve at scale.

Research Methodology:

Hope you can frame specific problems with your domain and technical expertise from the topics highlighted above. Let me recommend a methodology to solve any of these problems. Some points may look obvious for the researchers, however, let me cover the points in the interest of a larger audience:

Identify your core strengths whether it is in theory, implementation, tools, security, or in a specific domain. Other new skills you can acquire while doing the research. Identifying the right research problem with suitable data is kind of reaching 50% of the milestone. This may overlap with other technology areas such as the Internet of Things (IoT), Artificial Intelligence (AI), and Cloud. Your passion for research will determine how long you can go in solving that problem. The trend is interdisciplinary research problems across the departments. So, one may choose a specific domain to apply the skills of big data and data science.

Literature survey : I strongly recommend to follow only the authenticated publications such as IEEE, ACM, Springer, Elsevier, Science direct, etc… Do not get into the trap of “International journal …” which publish without peer reviews. Please do not limit the literature survey to only IEEE/ACM papers only. A lot of interesting papers are available in arxiv.org and paperswithcode . One needs to check/follow the top research labs in industry and academia as per the shortlisted topic. That gives the latest research updates and helps to identify the gaps to fill in.

Lab ecosystem : Create a good lab environment to carry out strong research. This can be in your research lab with professors, post-docs, Ph.D. scholars, masters, and bachelor students in academia setup or with senior, junior researchers in industry setup. Having the right partnership is the key to collaboration and you may try the virtual groups as well. Having that good ecosystem boosts up the results as one can challenge the others on their approach to improve the results further.

Publish at right avenues: As mentioned in the literature survey, publish the research papers in the right forum where you will receive peer reviews from the experts around the world. We may get obstacles in this process in the way of rejections. However, as long as you receive constructive feedback, one should be thankful to the anonymous reviewers. You may see the potential opportunity to patent the ideas if the approach is novel, non-obvious, and inventive. The recent trend is to open source the code while publishing the paper. If your institution permits it to open source, you may do so by uploading the relevant code in Github with appropriate licensing terms and conditions.

Top Research labs to follow:

Some of these research areas are active in the top research centers around the world. I request you to follow them and identify further gaps to continue the work. Here are some of the top research centers around the world to follow in big data + data science area:

RISE Lab at the University of Berkeley , USA

Doctoral Research Centre in Data Science, The University of Edinburgh, United Kingdom

Data Science Institute, Columbia University, USA

The Institute of Data-Intensive Engineering and Science, John Hopkins University, USA

Facebook Data Science research

Big Data Institute, University of Oxford, United Kingdom

Center for Big Data Analytics, The University of Texas at Austin, USA

Center for data science and big data analytics, Oakland University, USA

Institute for Machine Learning, ETH Zurich, Switzerland

The Alan Turing Institute, United Kingdom

IISc Computational and Data Sciences Research

Data Lab, Carnegie Mellon University, USA

If you wish to continue your learning in big data , here are my recommendations:

Coursera Big Data Specialization

Big data course from the University of California San Diego

Top 10 books based on your need can be picked up from the summary article in Analytics India Magazine.

Data Challenges:

In the process of solving the real-world problems, one may come across these challenges related to data:

Conclusion:

In this article, I briefly introduced the big data research issues in general and listed Top 20 latest research problems in big data and data science in 2020. These problems are further divided and presented in 5 categories so that the researchers can pick up the problem based on their interests and skill set. This list is no means exhaustive. However, I hope these inputs can excite some of you to solve the real problems in big data and data science. I covered these points along with some background on big data in a webinar for your reference [7]. You may refer to my other article which lists the problems to solve with data science amid Covid-19[8]. Let us come together to build a better world with technology.

References:

[1] https://www.gartner.com/en/newsroom/press-releases/2019-10-02-gartner-reveals-five-major-trends-shaping-the-evoluti

[2] https://www.forbes.com/sites/louiscolumbus/2019/09/25/whats-new-in-gartners-hype-cycle-for-ai-2019/#d3edc37547bb

[3] https://arxiv.org/ftp/arxiv/papers/1705/1705.04928.pdf

[4] https://www.xenonstack.com/insights/graph-databases-big-data/

[5] https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0206-3

[6] https://www.rd-alliance.org/group/big-data-ig-data-security-and-trust-wg/wiki/big-data-security-issues-challenges-tech-concerns

[7] https://www.youtube.com/watch?v=maZonSZorGI

[8] https://medium.com/@sunil.vuppala/ds4covid-19-what-problems-to-solve-with-data-science-amid-covid-19-a997ebaadaa6

Choose the right research problem and apply your skills to solve it. All the very best. Please share your feedback in the comments section. Feel free to add if you come across further topics in this area.

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Dr. Sunil Kumar Vuppala

Dr. Sunil is a Director of Data Science in Ericssion. 16+ years of exp in ML/DL, IoT, Analytics; Inventor, Speaker, Thought leader. Top Data Scientist in India.

Text to speech

Read our research on: Congress | Economy | Gender

Regions & Countries

Data science, measuring news consumption in a digital era.

As news outlets morph and multiply, both surveys and passive data collection tools face challenges.

What is machine learning, and how does it work?

How does a computer ‘see’ gender, sign up for our methods newsletter.

The latest on survey methods, data science and more, delivered quarterly.

All Data Science Publications

10 facts about americans and twitter.

23% of U.S. adults say they use Twitter. The share of Americans who use the platform has remained consistent over the past several years.

In Their Own Words, Americans Describe the Struggles and Silver Linings of the COVID-19 Pandemic

The outbreak has dramatically changed Americans’ lives and relationships over the past year. We asked people to tell us about their experiences – good and bad – in living through this moment in history.

Our latest Methods 101 video explains the basics of machine learning and how it allows our researchers to analyze data on a large scale.

Q&A: Why we studied American sermons and how we did it

Dennis Quinn, computational social scientist, explains how our analysis of sermons came together and the challenges that arise when religion meets big data.

The challenges of using machine learning to identify gender in images

This essay on the lessons we learned about deep learning systems and gender recognition is one part of a three-part examination of issues relating to machine vision technology.

A computer can be trained to predict whether an image shows a man or a woman. Can you identify which parts of the face are most essential to the computer’s decision?

How we examined public attitudes about the tone of U.S. political debate

We explored how Americans feel about the tenor of debate in the country in a recent major survey about U.S. political disource. Here's how we did it.

Use of election forecasts in campaign coverage can confuse voters and may lower turnout

Probability forecasts have gained prominence in recent years. But these forecasts may confuse potential voters and may even lower the likelihood that they vote.

Our Response to Concerns Raised About Our Analysis of the FCC’s Net Neutrality Public Comments

By Lee Rainie Pew Research Center released a report on Nov. 29 analyzing the 21.7 million comments submitted online during the U.S. Federal Communications Commission’s open public comment period on net neutrality. Fight for the Future has raised concerns about some aspects of our report, two of which point out inaccuracies that do not change the overall findings of the study. We believe, however, that Fight for the Future’s other points mischaracterize our report and our nonpartisan, non-advocacy mission. The first correction we have made concerns the total number of comments made during a 2014 FCC campaign to solicit public […]

Refine Your Results

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

Skip to main content

Visit timeshighereducation.com

THEunijobs logo

Research Fellow, Statistics and Data Science

Go to NATIONAL UNIVERSITY OF SINGAPORE profile

Job Description

The successful candidate will work with Dr. Michael Choi and co-supervised by Dr. Wenqian Chen on stochastic algorithms with applications in bioinformatics and protein folding under a project on MAPLE: Mechanistic Accelerated Prediction of Protein Secondary Structure via LangEvin Monte Carlo, supported by a Ministry of Education Tier 1 grant under the Data for Science and Science for Data collaboration scheme.

The main responsibilities of the position include:

Qualifications

Covid-19 Message

At NUS, the health and safety of our staff and students are one of our utmost priorities, and COVID-vaccination supports our commitment to ensure the safety of our community and to make NUS as safe and welcoming as possible. Many of our roles require a significant amount of physical interactions with students/staff/public members. Even for job roles that may be performed remotely, there will be instances where on-campus presence is required.

Taking into consideration the health and well-being of our staff and students and to better protect everyone in the campus, applicants are strongly encouraged to have themselves fully COVID-19 vaccinated to secure successful employment with NUS.

More Information

Location: Kent Ridge Campus Organization: Science Department : Statistics and Data Science Employee Referral Eligible: No Job requisition ID : 18761

Similar jobs

Research fellow (dept of biochemistry).

NATIONAL UNIVERSITY OF SINGAPORE logo

Research Fellow (Bioinformatician), Cancer Science Institute

Research Fellow, Cardiometabolic Diseases

More searches like this

Get in touch

The Role of Data Science in Research

shutterstock_531456433

The training was very relevant. I am about to start a project that aims to predict phenotype based on genetic data, which I plan to approach using machine learning. I really enjoyed the discussions on pitfalls of machine learning, what makes them effective, what can be expected of them and what can’t be expected of them.

Eric Lucas, Post Doctoral Research Associate, Liverpool School of Tropical Medicine.

In academia, new applications of Machine Learning are emerging that improve the accuracy and efficiency of processes, and open the way for disruptive data-driven solutions. For example, the implementation of Data Science in Biomedicine is helping to accelerate patient diagnoses and create personalised medicine based on biomarkers.

Aligned with these advancements, we have received growing interest from professionals in academic disciplines outside of computer science, regarding what are the Data Science tools and techniques they need to know to prepare for the future, and what are the relevant applications in their area of specialisation.

Working with Liverpool School of Tropical Medicine (LSTM), we set out to address these questions and upskill their Department of Vector Biology in Data Science using Python. Our goal was to provide PhD’s and Post Doctoral Researchers with transferable knowledge and Data Science skills they can apply to their research in Epidemiology and Bioinformatics.

In this article we will provide an overview of:

Applications of Data Science in Epidemiology

Liverpool School of Tropical Medicine Cambridge Spark Case Study

It’s worth noting that this Data Science training strategy can be applied in any field. Cambridge Spark Data Science and Machine Learning training programmes are designed to equip individuals with the skills to gather, analyse and interpret structured and unstructured data, in just two days.

Get in touch with us to learn more about the course!

An Introduction to Data Science in Python

The essential Data Science techniques researchers need to know about

To build data science capabilities, the first step is to upskill researchers and subject-matter experts in the foundations of Data Science using Python. Widely-used techniques to start learning are:

Data Science   Essentials

Unsupervised Learning and Supervised Learning

Unsupervised Learning

Supervised Learning

Ensemble Models

How researchers can make use of Machine Learning

Current research initiatives are using Machine Learning to detect health threats and improve diagnosis accuracy /efficiency to have a positive impact on patient outcomes. Examples include:

A training plan for researchers at the   Liverpool School of Tropical Medicine

“The course was intended to improve the data science capability of our department, though each student had their own motivation for signing up. Personally, I was looking for an overview of machine learning tools, the necessary considerations when applying them, and indications about how to implement them,” said Eric Lucas, Post Doctoral Research Associate, Liverpool School of Tropical Medicine. Aligned with these technical specifications and learning objectives Cambridge Spark delivered a three-day Introduction to Data Science using Python training session, on-site, at the Department of Vector Biology.

The training was very relevant. I am about to start a project that aims to predict phenotype based on genetic data, which I plan to approach using machine learning.  I really enjoyed the discussions on pitfalls of machine learning, what makes them effective, what can be expected of them and what can’t be expected of them. Eric Lucas, Post Doctoral Research Associate, Liverpool School of Tropical Medicine.

“I enjoyed learning about how the different machine learning tools work, their strengths and weaknesses. I do a lot of data analysis already (using a lot of tools that overlap strongly with machine learning, such as logistic regressions, PCA, clustering analysis) and I generally get a kick out of thinking about data,” said Eric Lucas, Post Doctoral Research Associate, Liverpool School of Tropical Medicine. “I was actively searching for organisations that could provide in-house machine learning courses, and the course which Raoul proposed matched very closely with what I envisaged.”

Interested in training for your teams?

Whether you're looking to train 5 people or 100 people, we have a variety of scalable training solutions to help you address a wide spectrum of training needs within the fields of Data Science, Artificial Intelligence, or Software Engineering.

Please contact us with your details and any known requirements. We'll then get in touch and guide you through every step of the way.

Get in touch with a Cambridge Spark

Contact us

Northeastern University Graduate Programs

11 Data Science Careers Shaping Our Future

11 Data Science Careers Shaping Our Future

Industry Advice Analytics Computing and IT Engineering

For four years in a row, data scientist has been named the number one job in the U.S. by Glassdoor. What’s more, the U.S. Bureau of Labor Statistics reports that the demand for data science skills will drive a 27.9 percent rise in employment in the field through 2026. Not only is there a huge demand, but there is also a noticeable shortage of qualified data scientists.

Daniel Gutierrez, managing editor of insideBIGDATA , told Forbes, “The word on the street is there’s definitely a shortage of people who can do data science.” If you have a passion for computers, math, and discovering answers through data analysis, then earning an advanced degree in data science or data analytics might be your next step.

What is Data Science?

Martin Schedlbauer , PhD and data science professor at Northeastern University , says that data science is used by “computing professionals who have the skills for collecting, shaping, storing, managing, and analyzing data [as an] important resource for organizations to allow for data-driven decision making.” Almost every interaction with technology includes data—your Amazon purchases, Facebook feed, Netflix recommendations, and even the facial recognition required to sign in to your phone.

Amazon is a prime example of just how helpful data collection can be for the average shopper. Amazon’s data sets remember what you’ve purchased, what you’ve paid, and what you’ve searched. This allows Amazon to customize its subsequent homepage views to fit your needs. For example, if you search camping gear, baby items, and groceries, Amazon will not spam you with ads or product recommendations for geriatric vitamins. Instead, you are going to see items that may actually benefit you, such as a compact camping high chair for infants.

Learn More : Data Analytics vs. Data Science: A Breakdown

Similarly, data science can be useful for reminding you of habitual purchases. If you order diapers every month, for example, you might see a strategically placed coupon or deal around the same time each month. This use of data is meant to act as a trigger, prompting you to think, “I just remembered I need to buy diapers, and I should buy them now because they are on sale.”

Data science benefits both companies and consumers alike. McKinsey Global Institute found that big data can increase a retailer’s profit margin by 60 percent, and “services enabled by personal-location data can allow consumers to capture $600 billion in economic surplus,” meaning they are able to purchase a good or service for less than they were expecting. For example, if you budgeted $7,500 to purchase a jacuzzi and then found the exact model you wanted for $6,000, your economic surplus would be $1,500. Data science can simultaneously increase retailer profitability and save consumers money, which is a win-win for a healthy economy.

Why is Data Science Important? 

Data science enables retailers to influence our purchasing habits, but the importance of gathering data extends much further.

Data science can improve public health through wearable trackers that motivate individuals to adopt healthier habits and can alert people to potentially critical health issues. Data can also improve diagnostic accuracy, accelerate finding cures for specific diseases, or even stop the spread of a virus. When the Ebola virus outbreak hit West Africa in 2014, scientists were able to track the spread of the disease and predict the areas most vulnerable to the illness. This data helped health officials get in front of the outbreak and prevent it from becoming a worldwide epidemic.

Data science has critical applications across most industries. For example, data is used by farmers for efficient food growth and delivery, by food suppliers to cut down on food waste, and by nonprofit organizations to boost fundraising efforts and predict funding needs.

In a 2015 speech, Economist and Freakonomics author Steven Levitt said  that CEOs know they are missing out on the importance of Big Data, but they do not have the right teams in place to perform the skills. He says, “I really do believe still that the combination of collaborations with firms’ big data and randomization […] is absolutely going to be at the center of what economics is and what other social sciences are going forward.”

Pursuing a career in data science is a smart move, not just because it is trendy and pays well, but because data very well may be the pivot point on which the entire economy turns.

In-Demand Data Science Careers

Data science experts are needed in virtually every job sector—not just in technology. In fact, the five biggest tech companies—Google, Amazon, Apple, Microsoft, and Facebook—only employ one-half of one percent of U.S. employees . However—in order to break into these high-paying, in-demand roles—an advanced education is generally required.

“Data scientists are highly educated–88 percent have at least a master’s degree and 46 percent have PhDs–and while there are notable exceptions, a very strong educational background is usually required to develop the depth of knowledge necessary to be a data scientist,” reports KDnuggets , a leading site on Big Data.

Here are some of the leading data science careers you can break into with an advanced degree.

1. Data Scientist

Average Salary: $117,212

Typical Job Requirements: Find, clean, and organize data for companies. Data scientists will need to be able to analyze large amounts of complex raw and processed information to find patterns that will benefit an organization and help drive strategic business decisions. Compared to data analysts , data scientists are much more technical.

Learn more: What Does a Data Scientist Do?

2. Machine Learning Engineer

Average Salary: $131,001

Typical Job Requirements: Machine learning engineers create data funnels and deliver software solutions. They typically need strong statistics and programming skills, as well as a knowledge of software engineering. In addition to designing and building machine learning systems, they are also responsible for running tests and experiments to monitor the performance and functionality of such systems.

3. Machine Learning Scientist

Average Salary: $137,053

Typical Job Requirements: Research new data approaches and algorithms to be used in adaptive systems including supervised, unsupervised, and deep learning techniques. Machine learning scientists often go by titles like Research Scientist or Research Engineer.

4. Applications Architect

Average Salary: $129,000

Typical Job Requirements: Track the behavior of applications used within a business and how they interact with each other and with users. Applications architects are focused on designing the architecture of applications as well, including building components like user interface and infrastructure.

5. Enterprise Architect

Average Salary: $150,782

Typical Job Requirements: An enterprise architect is responsible for aligning an organization’s strategy with the technology needed to execute its objectives. To do so, they must have a complete understanding of the business and its technology needs in order to design the systems architecture required to meet those needs.

6. Data Architect

Average Salary: $118,868

Typical Job Requirements: Ensure data solutions are built for performance and design analytics applications for multiple platforms. In addition to creating new database systems, data architects often find ways to improve the performance and functionality of existing systems, as well as working to provide access to database administrators and analysts.

7. I nfrastructure Architect

Average Salary: $127,676

Typical Job Requirements: Oversee that all business systems are working optimally and can support the development of new technologies and system requirements. A similar job title is Cloud Infrastructure Architect, which oversees a company’s cloud computing strategy.

8. Data Engineer

Average Salary: $112,493

Typical Job Requirements : Perform batch processing or real-time processing on gathered and stored data. Data engineers are also responsible for building and maintaining data pipelines that create a robust and interconnected data ecosystem within an organization, making information accessible for data scientists.

9. Business Intelligence (BI) Developer

Average Salary: $92,013

Typical Job Requirements: BI developers design and develop strategies to assist business users in quickly finding the information they need to make better business decisions. Extremely data-savvy, they use BI tools or develop custom BI analytic applications to facilitate the end-users’ understanding of their systems.

10. Statistician

Average Salary: $88,989

Typical Job Requirements: Statisticians work to collect, analyze, and interpret data in order to identify trends and relationships which can be used to inform organizational decision-making. Additionally, the daily responsibilities of statisticians often include designing data collection processes, communicating findings to stakeholders, and advising organizational strategy.

Learn More: What Do Statisticians Do?

11. Data Analyst

Average Salary: $69,517

Typical Job Requirements: Transform and manipulate large data sets to suit the desired analysis for companies. For many companies, this role can also include tracking web analytics and analyzing A/B testing. Data analysts also aid in the decision-making process by preparing reports for organizational leaders which effectively communicate trends and insights gleaned from their analysis.

Learn More: What Does a Data Analyst Do?

Data Scientists Are in Constant Demand

Schedlbauer concludes that while some data science work will likely be automated within the next 10 years, “there is a clear need for professionals who understand a business need, can devise a data-oriented solution, and then implement that solution.”

Data science experts are needed in almost every field, from government security to dating apps. Millions of businesses and government departments rely on big data to succeed and better serve their customers. Data science careers are in high demand and this trend will not be slowing down any time soon, if ever.

Breaking Into the Field

If you want to break into the field of data science , there are a number of ways you can prepare yourself to take on these challenging yet exciting roles. Perhaps most importantly, you will need to impress future employers by demonstrating your expertise and previous work experience. One such way you can build those skills and experience is to pursue an advanced degree program in your area of interest.

Northeastern University, for example, offers master’s degree programs in both data science and data analytics which are designed to develop the skills that employers are seeking. Both programs also provide students with the opportunity to participate in co-ops and experiential learning experiences, allowing them to build hands-on experience prior to graduating. Once you have considered factors like your personal background, interests, and career aspirations, you will be able to determine which degree program is right for you and take the next step towards achieving your goals.

research work in data science

Subscribe below to receive future content from the Graduate Programs Blog.

About kelsey miller, related articles.

What Does a Data Scientist Do?

What Does a Data Scientist Do?

The Biggest Data Analytics Challenges of 2022

The Biggest Data Analytics Challenges of 2022

Top 4 Analytics Interview Questions and How to Prepare

Top 4 Analytics Interview Questions and How to Prepare

Did you know.

Nearly 50% of CIOs report having issues finding qualified candidates for advanced data roles (State of the CIO Report, 2020)

Graduate Programs in Analytics

Join the next generation of data-driven leaders.

Most Popular:

Tips for taking online classes: 8 strategies for success, public health careers: what can you do with a master’s degree, 7 international business careers that are in high demand, edd vs. phd in education: what’s the difference, 7 must-have skills for data analysts, in-demand biotechnology careers shaping our future, the benefits of online learning: 7 advantages of online degrees, how to write a statement of purpose for graduate school, keep reading:.

research work in data science

Join Us at Northeastern’s Graduate Open House | March 14-16, 2023

research work in data science

Northeastern’s Online DMSc Program: What To Expect

research work in data science

Doctor of Health Science vs. Medical Science: Which Is Better?

research work in data science

What to Look for in an Online College: A Guide

data science Recently Published Documents

Total documents.

Assessing the effects of fuel energy consumption, foreign direct investment and GDP on CO2 emission: New data science evidence from Europe & Central Asia

Documentation matters: human-centered ai system to assist data science code documentation in computational notebooks.

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

Data science in the business environment: Insight management for an Executive MBA

Adventures in financial data science, gecoagent: a conversational agent for empowering genomic data extraction and analysis.

With the availability of reliable and low-cost DNA sequencing, human genomics is relevant to a growing number of end-users, including biologists and clinicians. Typical interactions require applying comparative data analysis to huge repositories of genomic information for building new knowledge, taking advantage of the latest findings in applied genomics for healthcare. Powerful technology for data extraction and analysis is available, but broad use of the technology is hampered by the complexity of accessing such methods and tools. This work presents GeCoAgent, a big-data service for clinicians and biologists. GeCoAgent uses a dialogic interface, animated by a chatbot, for supporting the end-users’ interaction with computational tools accompanied by multi-modal support. While the dialogue progresses, the user is accompanied in extracting the relevant data from repositories and then performing data analysis, which often requires the use of statistical methods or machine learning. Results are returned using simple representations (spreadsheets and graphics), while at the end of a session the dialogue is summarized in textual format. The innovation presented in this article is concerned with not only the delivery of a new tool but also our novel approach to conversational technologies, potentially extensible to other healthcare domains or to general data science.

Differentially Private Medical Texts Generation Using Generative Neural Networks

Technological advancements in data science have offered us affordable storage and efficient algorithms to query a large volume of data. Our health records are a significant part of this data, which is pivotal for healthcare providers and can be utilized in our well-being. The clinical note in electronic health records is one such category that collects a patient’s complete medical information during different timesteps of patient care available in the form of free-texts. Thus, these unstructured textual notes contain events from a patient’s admission to discharge, which can prove to be significant for future medical decisions. However, since these texts also contain sensitive information about the patient and the attending medical professionals, such notes cannot be shared publicly. This privacy issue has thwarted timely discoveries on this plethora of untapped information. Therefore, in this work, we intend to generate synthetic medical texts from a private or sanitized (de-identified) clinical text corpus and analyze their utility rigorously in different metrics and levels. Experimental results promote the applicability of our generated data as it achieves more than 80\% accuracy in different pragmatic classification problems and matches (or outperforms) the original text data.

Impact on Stock Market across Covid-19 Outbreak

Abstract: This paper analysis the impact of pandemic over the global stock exchange. The stock listing values are determined by variety of factors including the seasonal changes, catastrophic calamities, pandemic, fiscal year change and many more. This paper significantly provides analysis on the variation of listing price over the world-wide outbreak of novel corona virus. The key reason to imply upon this outbreak was to provide notion on underlying regulation of stock exchanges. Daily closing prices of the stock indices from January 2017 to January 2022 has been utilized for the analysis. The predominant feature of the research is to analyse the fact that does global economy downfall impacts the financial stock exchange. Keywords: Stock Exchange, Matplotlib, Streamlit, Data Science, Web scrapping.

Information Resilience: the nexus of responsible and agile approaches to information use

AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.

qEEG Analysis in the Diagnosis of Alzheimers Disease; a Comparison of Functional Connectivity and Spectral Analysis

Alzheimers disease (AD) is a brain disorder that is mainly characterized by a progressive degeneration of neurons in the brain, causing a decline in cognitive abilities and difficulties in engaging in day-to-day activities. This study compares an FFT-based spectral analysis against a functional connectivity analysis based on phase synchronization, for finding known differences between AD patients and Healthy Control (HC) subjects. Both of these quantitative analysis methods were applied on a dataset comprising bipolar EEG montages values from 20 diagnosed AD patients and 20 age-matched HC subjects. Additionally, an attempt was made to localize the identified AD-induced brain activity effects in AD patients. The obtained results showed the advantage of the functional connectivity analysis method compared to a simple spectral analysis. Specifically, while spectral analysis could not find any significant differences between the AD and HC groups, the functional connectivity analysis showed statistically higher synchronization levels in the AD group in the lower frequency bands (delta and theta), suggesting that the AD patients brains are in a phase-locked state. Further comparison of functional connectivity between the homotopic regions confirmed that the traits of AD were localized in the centro-parietal and centro-temporal areas in the theta frequency band (4-8 Hz). The contribution of this study is that it applies a neural metric for Alzheimers detection from a data science perspective rather than from a neuroscience one. The study shows that the combination of bipolar derivations with phase synchronization yields similar results to comparable studies employing alternative analysis methods.

Big Data Analytics for Long-Term Meteorological Observations at Hanford Site

A growing number of physical objects with embedded sensors with typically high volume and frequently updated data sets has accentuated the need to develop methodologies to extract useful information from big data for supporting decision making. This study applies a suite of data analytics and core principles of data science to characterize near real-time meteorological data with a focus on extreme weather events. To highlight the applicability of this work and make it more accessible from a risk management perspective, a foundation for a software platform with an intuitive Graphical User Interface (GUI) was developed to access and analyze data from a decommissioned nuclear production complex operated by the U.S. Department of Energy (DOE, Richland, USA). Exploratory data analysis (EDA), involving classical non-parametric statistics, and machine learning (ML) techniques, were used to develop statistical summaries and learn characteristic features of key weather patterns and signatures. The new approach and GUI provide key insights into using big data and ML to assist site operation related to safety management strategies for extreme weather events. Specifically, this work offers a practical guide to analyzing long-term meteorological data and highlights the integration of ML and classical statistics to applied risk and decision science.

Export Citation Format

Share document.

Your Guide to Data Science Careers (+ How to Get Started)

Careers in data science are in-demand. Step into the world of big data and machine learning.

A female data scientist presents her findings to the team.

Data science continues to rise as one of the most in-demand career paths in technology today. Beyond data analysis , mining, and programming, data scientists program code and combine it with statistics to transform data. These insights can help businesses derive return on investment (ROI) or organizations measure their social impact.

The data science field is interdisciplinary and integral to society’s basic functions, such as restocking grocery stores, tracking political campaigns, and keeping medical records. It can be a fascinating and fulfilling career to participate in this growing field.

There are many career opportunities within data science. Here’s a guide to what data science is, the skills required, job types, and how to get there.

What is data science? Definition, skills, and job outlook

Data science grew out of statistics and data mining. This is a new specialty, with the Data Science Journal making its debut only in 2002. It sits at the intersection of software development, machine learning, research, and data science while falling under the categories of computer science, business, and statistics combined. Data professionals create algorithms to translate data patterns into research that informs government agencies, companies, and other organizations.

Data science exists because information technology is evolving at a rapid pace, and there is a need to make sense of it all. 

Skills required in data science

In a field like data science, there are a number of technical skills that are helpful to have before diving in, such as:

Deep knowledge and familiarity with statistical analysis

Machine learning

Deep learning

Data visualization

Mathematics

Programming

Ability to manage unstructured data

Familiarity with SAS, Hadoop, Spark, Python , R, and other data analysis tools

Big data processes, systems, and networks

Software engineering

A career in data science is not limited to technical knowledge. You’ll work on teams with other engineers, developers, coders, analysts, and business managers. These workplace skills will help take you farther:

Communication skills

Storytelling

Critical thinking and logic

Business acumen

Data science job outlook

The future is bright for aspiring data science professionals. In 2020, IBM predicted that there would be 2.7 million open jobs across data science and related careers and that there would be a 39 percent growth in employer demand for data scientists and data engineers [ 1 ].

For data scientists specifically, the US Bureau of Labor Statistics estimates the employment growth rate to grow by 22 percent by 2030 [ 2 ]. It is considered the third-best job in the US (as of March 2022), according to Glassdoor [ 3 ]. 

Read more: Data Scientist Salary Guide: What to Expect

Data science job roles

There are plenty of data science jobs to choose from. All of them are integral to making key business decisions. Often, several of the job types below will work together on the same team.

Data scientist

Data scientists build models using programming languages such as Python. They then transform these models into applications. Often working as part of a team, for example, with a business analyst, a data engineer, and a data (or IT) architect, they help solve complex problems by analyzing data and making predictions about the future. This role is typically considered an advanced version of a data analyst.

Average US salary: $ 99,138  [ 4 ]

Skills needed: Statistics, mathematics, machine and deep learning, programming skills, data analysis, big data processes, and tools like Hadoop, SQL, and more.

Education: Bachelor’s degree in a related field, although increasingly data science bootcamps, master’s programs, and professional certificates can help career switchers reach their goals. According to a Burtch Works study of data scientists and salaries, more than 94 percent of data scientists held a master’s or doctorate degree [ 5 ].

Read more: ​​ What Is a Data Scientist? Salary, Skills, and How to Become One

Data analyst

Data analysts, unlike data scientists , use structured data to solve business problems. Using tools such as SQL, Python, and R, statistical analysis, and data visualization, data analysts acquire, clean, and reorganize data for analysis to spot trends that can be turned into business insights. They tend to bridge the gap between data scientists and business analysts.

Average US salary: $63,344 [ 6 ]

Skills needed: Programming languages (SQL, Python, R, SAS), statistics and math, data visualization

Education: Bachelor’s degree in mathematics, computer science , finance, statistics, or a related field

Read more: What Does a Data Analyst Do? A Career Guide

Data architect

Data architects create the blueprints for data management systems, designing plans to integrate and maintain all types of data sources. They oversee the underlying processes and infrastructure. Their main goal is to enable employees to gain access to information when they need it. 

Average US salary: $115,196 [ 7 ]

Skills needed: Coding languages such as Python and Java, data mining and management, machine learning, SQL, and data modeling

Education: A bachelor’s degree in data, computer science, or a related field. If you are switching careers, a bootcamp or professional certificate can help develop your skills in data management.

Read more: What Does a Data Architect Do? A Career Guide

Data engineer

Data engineers prepare and manage large amounts of data. They also develop and optimize data pipelines and infrastructure, getting the data ready for data scientists and business analysts to work with. Data Engineers make the data accessible so businesses can optimize their performance.

Average US salary: $97,727 [ 8 ]

Skills needed: Programming languages such as Java, understanding of NoSQL databases (MongoDB), and frameworks like Apache Hadoop

Education: A bachelor’s degree in math, science, or a business-related field is helpful. Professional certificates and bootcamps are also an option to brush up on skills.

Read more: What Is a Data Engineer?: A Guide to This In-Demand Career

Machine learning engineer

This role is not an entry-level position, but one you can build toward as a data scientist or engineer. Machine learning uses algorithms that replicate how humans learn and act, to interpret data and build accuracy over time. As part of a data science team, machine learning engineers research, build, and design artificial intelligence that facilitates machine learning. They also serve as a liaison between data scientists, data architects, and more. 

Average US salary: $100,066  [ 9 ]

Skills needed: Knowledge of tools such as Spark, Hadoop, R, Apache Kafka, Tensorflow, Google Cloud Machine Learning Engine, and more. An understanding of data structures and modeling, quantitative analysis, and computer science basics, is also helpful. 

Education: Often a master’s degree or even a Ph.D in computer science or related fields is expected. Gain an introduction to this field by enrolling in one of Coursera’s most popular courses, Machine Learning .

Read more: What is a Machine Learning Engineer and How Can You Get Started?

Business analyst

As a business analyst, you’ll use data to form business insights and make recommendations for companies and organizations to improve their systems and processes. Business analysts identify issues in any part of the organization, including staff development and organizational structures, so that businesses can increase efficiency and cut costs.

Average US salary: $74,366 [ 10 ]

Skills needed: Using SQL and Excel, data visualization, financial modeling, data and financial analysis, business acumen

Education: Bachelor’s degree in economics, finance, computer science, statistics, business, or a related field

Read more: What Is a Business Analyst? 2022 Career Guide

The path to a data science career

With so many exciting options in data science, you may be wondering where to begin. Whether you are just starting your career or switching from another one, here are the steps you can take to build toward your future in big data or machine learning.

Education: What should I learn?

To get started in any data science role, earning a degree or certificate can be a great entry point.

Bachelor’s degree: For many, a bachelor’s degree in data science, business, economics, statistics, math, information technology, or a related field can help you gain leverage as an applicant. From these programs, you’ll learn how to analyze data, and use numbers, systems, and tools to solve problems. 

But if your bachelor’s degree is in the arts or humanities, don’t fret. Your ability to think critically and creatively is not lost in a data science career. If you don’t have a degree at all, there are several options for you too.

Online courses and professional certificates: Whether or not you have earned a bachelor’s degree, an online course or professional certificate can be helpful when applying for data science-related jobs.

You can list these courses on your resume or LinkedIn for additional credibility. Typically, these courses take a few months to complete (on a part-time basis) and will set you up for at least an entry-level position.

If you are interested in diving into a data science career, you might consider IBM's Data Science Professional Certificate or Stanford University’s Machine Learning Course .

Placeholder

professional certificate

Kickstart your career in data science & ML. Build data science skills, learn Python & SQL, analyze & visualize data, build machine learning models. No degree or prior experience required.

(61,447 ratings)

166,791 already enrolled

BEGINNER level

Average time: 5 month(s)

Learn at your own pace

Skills you'll build:

Data Science, Deep Learning, Machine Learning, Big Data, Data Mining, Github, Python Programming, Jupyter notebooks, Rstudio, Methodology, CRISP-DM, Data Analysis, Pandas, Numpy, Cloud Databases, Relational Database Management System (RDBMS), SQL, Predictive Modelling, Data Visualization (DataViz), Model Selection, Dashboards and Charts, dash, Matplotlib, SciPy and scikit-learn, regression, classification, Hierarchical Clustering, Jupyter Notebook, Data Science Methodology, K-Means Clustering

Bootcamps: If you are willing to spend a few weeks or months pursuing a bootcamp , there are plenty of options to pivot and gain the necessary skills for a data science career. Some bootcamps are in-person over a few weeks or months with a cohort, while others are completed online or at your own pace. The benefits of an in-person bootcamp are the community and network you’ll have access to upon completion. 

Some popular options include:

General Assembly offers an online data science course , an online data science bootcamp , as well as a data science immersive bootcamp in New York and other cities. Plus, the community-driven network model could help you land a job more quickly.

Flatiron School is a similar model that also offers full- and part-time data science bootcamps online and in New York City.

Brainstation offers full- and part-time data science bootcamps online or in one of its cities (NYC, Toronto, Miami, London, or Vancouver).

Clarusway has bootcamps for data science , data analytics , and machine learning . 

Skills needed

You’ll need a combination of technical and workplace skills to succeed in a data science career. Here are some of the most common skills employers look for in data science roles, whether you're aiming to become a data scientist or a machine learning engineer:

Technical skills

Machine and deep learning

Data wrangling

Data engineering

Cloud computing

  Workplace skills

Communication

Adaptability and flexibility

Critical thinking

Problem solving

Experience: How do I get a job?

Once you’ve completed a course or certificate and gained the necessary skills, you’ll want to get some work experience.

Entry-level job or internship: To land your first job or internship, you’ll want to rely on applying to jobs that specifically cater to those starting out in the data science field. That way, you can feel supported as you prove your worth, develop your skills, and move up in your career.

Some job seekers report applying for hundreds of jobs before obtaining an interview. But don’t be discouraged, because data science roles are also in demand. Your hard work will pay off.

Interviews: Once you’ve secured an interview, practice communicating with a non-technical friend about your process. Pretend that your interviewer has no idea about your project, so you can talk through your decisions about which tools you choose and why you coded an algorithm in a certain way. You’ll want to prove that you are familiar with the languages and systems you’ll be using on the job.

Explore data science with Coursera

Boost your career in data science by enrolling in IBM’s Data Science professional certificate program. You’ll learn how to analyze data and communicate results to inform data-driven decisions in 11 months or less, all at your own pace.

Related articles

​​ What Is a Data Scientist? Salary, Skills, and How to Become One

How to Choose a Career: 7 Ways to Narrow Your Options

How to Become a Data Analyst (with or Without a Degree)

How to Choose a Data Science Bootcamp

Article sources 

1. IBM. “ The Quant Crunch: How the Demand for Data Science Skills is Disrupting the Job Market , https://www.ibm.com/downloads/cas/3RL3VXGA.” Accessed June 26, 2022.

2. US Bureau of Labor Statistics. “ Computer and Information Research Scientists , https://www.bls.gov/ooh/computer-and-information-technology/computer-and-information-research-scientists.htm.” Accessed June 26, 2022.

3. Glassdoor. “ 50 Best Jobs in America for 2022 , https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm.” Accessed June 26, 2022.

4. Glassdoor. “ How much does a Data Scientist make? , https://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm.” Accessed June 26, 2022.

5. Burtch Works. “ The Burtch Works Study Salaries of Data Scientists & Predictive Analytics Professionals , https://www.burtchworks.com/wp-content/uploads/2020/08/Burtch-Works-Study_DS-PAP-2020.pdf.” Accessed June 26, 2022.

6. Glassdoor. “ How much does a Data Analyst make?,       https://www.glassdoor.com/Salaries/data-analyst-salary-SRCH_KO0,12.htm”  Accessed June 26, 2022. 

7. Glassdoor. “ How much does a Data Architect make? , https://www.glassdoor.com/Salaries/data-architect-salary-SRCH_KO0,14.htm.” Accessed June 26, 2022.

8. Glassdoor. “ How much does a Data Engineer make? , https://www.glassdoor.com/Salaries/data-engineer-salary-SRCH_KO0,13.htm.” Accessed June 26, 2022.

9. Glassdoor. “ How much does a Machine Learning Engineer make? , https://www.glassdoor.com/Salaries/machine-learning-engineer-salary-SRCH_KO0,25.htm.” Accessed June 26, 2022.

10. Glassdoor. “ How much does a Business Analyst make? , https://www.glassdoor.com/Salaries/business-analyst-salary-SRCH_KO0,16.htm.” Accessed June 26, 2022.

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Develop career skills and credentials to stand out

Coursera Footer

Start or advance your career.

Popular Courses and Certifications

Popular collections and articles

Earn a degree or certificate online

Placeholder

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

What is data science?

Discover what a data scientist does and how to become a successful data scientist

What is a data scientist?

A data scientist leads research projects to extract valuable information from big data and is skilled in technology, mathematics, business, and communications. Organizations use this information to make better decisions, solve complex problems, and improve their operations. By revealing actionable insights hidden in large datasets, a data scientist can significantly improve his or her company’s ability to achieve its goals. That's why data scientists are in high demand and even considered "rock stars" in the business world.

Introduction to data science

Data science is the scientific study of data to gain knowledge. This field combines multiple disciplines to extract knowledge from massive datasets for the purpose of making informed decisions and predictions. Data scientists, data analysts, data architects, data engineers, statisticians, database administrators, and business analysts all work in the data science field.

The need for data science is growing rapidly as the amount of data increases exponentially and companies depend more heavily on analytics to drive revenue and innovation. For example, as business interactions become more digital, more data is created, presenting new opportunities to derive insights into how to better personalize experiences, improve service and customer satisfaction, develop new and enhanced products, and increase sales. Additionally, in the business world and beyond, data science has the potential to help solve some of the world's most difficult challenges.

What does a data scientist do?

A data scientist collects, analyzes, and interprets big data to uncover patterns and insights, make predictions, and create actionable plans. Big data can be defined as datasets that have greater variety, volume, and velocity than earlier methods of data management were equipped to handle. Data scientists work with many types of big data, including:

Additionally, the characteristics of the dataset can be described as quantitative , structured numerical data, or qualitative or categorical data , which is not represented through numerical values and can be grouped based on categories. It's important for data scientists to know the type of data they're working with, as it directly impacts the type of analyses they perform and the types of graphs they can use to visualize the data.

To gain knowledge from all these data types, data scientists utilize their skills in:

There's another skill that's critical to the question "What does a data scientist do?" Effectively communicating the results of their analyses to managers, executives, and other stakeholders is one of the most important parts of the job. Data scientists need to make their findings easy to understand for a non-technical audience, so they can use the insights to make informed decisions. Therefore, data scientists need to be skilled in:

Data science processes and deliverables

Data science processes.

Data scientists follow a similar process to complete their projects:

Once the data is cleaned, a data scientist explores the data and applies statistical analytical techniques to reveal relationships between data features and the statistical relationships between them and the values they predict (known as a label). The predicted label can be a quantitative value, like the financial value of something in the future, or the duration of a flight delay in minutes.

Exploration and preparation typically involve a great deal of interactive data analysis and visualization—usually using languages such as Python and R in interactive tools and environments that are specifically designed for this task. The scripts used to explore the data are typically hosted in specialized environments such as Jupyter Notebooks. These tools enable data scientists to explore the data programmatically while documenting and sharing the insights they find.

The data scientist builds and trains prescriptive or descriptive models, then tests and evaluates the model to make sure it answers the question or addresses the business problem. At its simplest, a model is a piece of code that takes an input and produces output. Creating a machine learning model involves selecting an algorithm, providing it with data, and tuning hyperparameters. Hyperparameters are adjustable parameters that let data scientists control the model training process. For example, with neural networks, the data scientist decides the number of hidden layers and the number of nodes in each layer. Hyperparameter tuning , also called hyperparameter optimization, is the process of finding the configuration of hyperparameters that result in the best performance.

A common question is "Which machine learning algorithm should I use?" A machine learning algorithm turns a dataset into a model. The algorithm the data scientist selects depends primarily on two different aspects of the data science scenario:

To help answer these questions, Azure Machine Learning provides a comprehensive portfolio of algorithms, such as Multiclass Decision Forest , Recommendation systems , Neural Network Regression , Multiclass Neural Network , and K-Means Clustering . Each algorithm is designed to address a different type of machine learning problem. In addition, The Azure Machine Learning Algorithm Cheat Sheet helps data scientists choose the right algorithm to answer the business question.

Data scientists might also use web-based data science notebooks, such as Zeppelin Notebooks, throughout the much of the process for data ingestion, discovery, analytics, visualization, and collaboration.

Data science methods

Data scientists use statistical methods such as hypothesis testing, factor analysis, regression analysis and clustering to unearth statistically sound insights.

Data science documentation

Although data science documentation varies by project and industry, it generally includes documentation that shows where the data comes from and how it was modified. This helps other members of the data team effectively use the data moving forward. For example, documentation helps business analysts use visualization tools to interpret the dataset.

Types of data science documentation include:

How to become a data scientist

There are multiple paths to becoming a data scientist. Requirements usually include a degree in information technology or computer science. However, some IT professionals learn data science by taking bootcamps and online courses, and others earn a data science master's degree or certification.

To learn how to be a data scientist, take advantage of these Microsoft training resources designed to help you:

Get your data scientist certification

Certifications are a great way to demonstrate your data science qualifications and jumpstart your career. Microsoft certified professionals are in high demand and there are jobs available for Azure data scientists right now. Explore the data scientist certifications most sought after by employers:

Differences between data analysts and data scientists

Like data scientists, data analysts work with large datasets to uncover trends in data. However, data scientists are typically more technical team members with more expertise and responsibility such as initiating and leading data science projects, building and training machine learning models, and presenting their findings to executives and at conferences. Some data scientists perform all of these tasks and others focus on specific ones, like training algorithms or building models. Many data scientists began their careers as data analysts and data analysts can be promoted to data scientist positions within a few years.

A data scientist leads research projects to extract valuable information from big data and is skilled in technology, mathematics, business, and communications. Organizations use this information to make better decisions, solve complex problems, and improve their operations. By revealing actionable insights hidden in large datasets, a data scientist can significantly improve his or her company's ability to achieve its goals. That's why data scientists are in high demand and even considered "rock stars" in the business world.

Learn about the data scientist role

Data science is the scientific study of data to gain knowledge. This field combines multiple disciplines to extract knowledge from massive datasets for the purpose of making informed decisions and predictions.

Get an introduction to data science

Data scientists lead research projects to extract valuable information and actionable insights from big data. This includes defining the problem to be solved, writing queries to pull the right data from databases, cleaning and sorting the data, building and training machine learning models, and using data visualization techniques to effectively communication the findings to stakeholders.

Find out how data scientists extract knowledge from data

Although data science documentation varies by project and industry, it generally includes project plans, user stories, model documentation, and supporting systems documentation such as user guides.

Learn about data science documentation

Some IT professionals learn data science by taking bootcamps and online courses, and others earn a data science master's degree or certification. Certifications are a great way to demonstrate your data science qualifications and jumpstart your career. Microsoft certified professionals are in high demand and there are jobs available for Azure data scientists right now.

Explore data science training resources and certifications

Like data scientists, data analysts work with large datasets to uncover trends in data. However, data scientists are more technical team members with more expertise and responsibility, such as initiating and leading data science projects, building and training machine learning models, and presenting the results of their projects to executives and at conferences. Some data scientists perform all of these tasks and others focus on specific ones, like training algorithms or building models.

See a comparison of data scientist and data analyst responsibilities

Additional resources

Get started with an Azure free account

Enjoy popular Azure services free for 12 months, more than 25 services free always, and $200 credit to use in your first 30 days.

Connect with an Azure AI sales specialist

Get advice on getting started with Azure AI. Ask questions, learn about pricing and best practices, and get help designing a solution to meet your needs.

Shield

Rice University REU

Computer and Data Science

Rice university reu in computer and data science.

Rice University is now accepting applications for a 10-week summer undergraduate research program in the general area of Computer and Data Science, generously funded by a gift from Google, LLC.

Program participants will be assigned to a Rice faculty mentor and will work closely with a Rice graduate student or Postdoctoral researcher to perform cutting-edge research in Computer Systems or Data Science. In the summer of 2023, we will be partnering with REUs at Texas A&M University in College Station and Prairie View A&M University. The summer will start with a joint kick-off program and end with a shared research symposium. During the summer there will be workshops on conducting research, reading papers, preparing scientific posters, as well as research, career, and technical talks.

The program will begin May 22 and end July 28. The program will require a full-time commitment. Participants will be given a $6500 stipend, $500 for travel expenses (if traveling from outside of Houston), and complimentary on-campus accommodations.

MORE INFORMATION

What is Data Science?

Data science is an interdisciplinary field of study, encompassing sub-areas of computer science, statistics, electrical engineering, and applied mathematics. It is the science of extracting actionable knowledge from large and complex data repositories, where “complex” may refer to the modality of the data (images, time series, text, as well as traditional tabular data) or other facets of the data in question (data can be complex because they are geographically distributed, or characterized by the ubiquity of missing or inaccurate values). Data Science has quickly become a critical enabling capability in many different fields: science, healthcare, energy, manufacturing, finance, and many others. Data can be used to train algorithms that are more accurate than experienced doctors in recognizing early-state tumors. It can also be used to predict when it is time to do preventative maintenance on a multi-million dollar machine, before a catastrophic failure. Data can be used to detect fraudulent activity in credit card transactions. Furthermore, it is becoming a core enabler for detecting and preventing cyberattacks.

Research in Data Science could include designing and implementing new machine learning algorithms, applying machine learning algorithms to solve specific problems, or developing methods to manage huge data sets.

What are systems projects?

Systems projects could include designing and implementing new networking algorithms, new computer architectures, or software systems.

What background is required for participation?

Most applicants would have completed their sophomore or junior year by the time the program begins. Most applicants will be majors in computer science, statistics, electrical engineering, or applied/pure mathematics. Some knowledge of computer programming in a language such as Python or Java is required.

Why come to Rice?

Boasting a 300-acre tree-lined campus in Houston, Rice University is ranked among the nation’s top 20 universities by U.S. News & World Report. Rice has a 6-to-1 undergraduate student-to-faculty ratio. Rice students and faculty discover, create and innovate, rising to challenges and solving real-world problems that make a measurable global impact.

MORE INFORMATION: Email Beth Rivera ( [email protected] )

Key faculty

12th and 13th International Meeting on Visualizing Biological Data (VIZBI 2022 & VIZBI 2023)

Total Downloads

About this Research Topic

This Research Topic collects work presented during the 12th International Meeting on Visualizing Biological Data (VIZBI 2022) and the 13th International Meeting on Visualizing Biological Data (VIZBI 2023) conferences. The conferences feature talks from 21 world-leading researchers showcasing visualizations transforming how life scientists view data, and driving key advances in molecular biology, systems biology, biomedical science, and ecology. In addition, the conference includes work presented as posters and lightning talks, contributed from the other conference attendees. All conference speakers and participants have the opportunity to disseminate work presented during the meeting as peer-reviewed publications, through this Research Topic. Manuscripts will be handled by an editorial board formed from VIZBI session chairs, along with eminent speakers from past meetings. Manuscript Types. Manuscripts must be formatted to match one of the following article types: Original Research, Systematic Reviews, Methods, Review, Mini Review, Perspective, Data Report, Brief Research Report, Opinion. For more information, see the description of acceptable Article Types for the Frontiers in Bioinformatics journal. Biological Data . Manuscripts for this Research Topic should describe advances in visualization method that are of direct relevance data from any area of the life sciences, including: ● Genomics and epigenetics datasets ● RNA or on transcriptome data ● Protein and proteomics datasets ● Cellular systems and cellular-scale data ● Tissue-scale data ● Populations or ecosystem data Visualization Methods. All accepted submissions in this Research Topic will appear in the Data Visualization section of the Frontiers in Bioinformatics journal. Within scope includes all visualization advances aimed at improving how life scientists explore and interpret data, or how they communicate their insights. This encompasses novel visual methods, software prototypes or tools, as well as user studies or design studies. We especially welcome contributions from interdisciplinary teams, including bioinformaticians, data scientists, computer scientists, and experimentalists, as well as medical illustrators, graphic designers, and graphic artists. Out of scope would be work on generic data visualization concepts that do not have direct application to specific biological datasets.

Keywords : bioinformatics, data visualization, visual analytics, computational biology, data science, biomedical data science

Important Note : All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic Editors

Topic coordinators, submission deadlines, participating journals.

Manuscripts can be submitted to this Research Topic via the following journals:

No records found

total views views downloads topic views

Top countries

Top referring sites, about frontiers research topics.

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Mallory Career Guide for Australia

Mallory

13 Data Science Careers That Are Exploding Now

Thinking about a career in data science let's take a look at your options here is a guide to the biggest career opportunities in data science, including job paths and education requirements..

Data science is one of the fastest growing and most in-demand careers today. Learning skills in this area is an exciting way to grow or change your career.

What exactly do data scientists do?

Data scientists analyse information. They take a multidisciplinary perspective, drawing from areas such as programming, machine learning, statistics, software engineering, human behaviour analysis, linear algebra, experimental science and data intuition. Data scientists solve problems and find new insights into how an objective can be achieved.

“Data is the new oil” said Clive Humby, a renowned mathematician. In a world where 2.5 quintillion bytes of data are created each day, it’s no surprise why data science jobs are in the new hot commodity. ~ Ritika Pradhan

After asking questions related to a fundamental problem, data scientists will work with raw data, collecting, organising and analysing it. They create and use algorithms for the identification of patterns and trends in the work of answering questions.

Then, after answering the questions at hand, data scientists use the analysed data to create visualisations. This is an important part of task of presenting data analysis and findings. Insights must be shown in a way that is accessible for colleagues who aren't trained or knowledgeable in technology.

Is data science right for you?

Successful data scientists have aptitude in the fields of maths, programming and statistics. Data scientists collect information and data, sorting and analysing it. They use different kinds of data sources when problem-solving and addressing questions. Heard of algorithms? You'd be creating these as a data scientist.

You’ll use statistics in your analysis and employ complex concepts, including data visualisation and machine learning. If you enjoy identifying patterns and trends, data science may well be the perfect field for you.

Combining computer science, modeling, statistics, analytics, and math skills—along with sound business sense—data scientists uncover the answers to major questions that help organizations make objective decisions. ~ Leslie Doyle

Computer programming, AI, and even areas such as human behaviour are fields where proficiency will be key to your success as a data scientist. If you have a curious mind and are a keen problem-solver, data science is certainly a field you may want to train for.

Perhaps it will surprise you that creativity is also an important part of this scientific field. Keeping an open mind and being willing to follow your instincts (after education and training, of course) are key.

1. Data Scientist

Data scientist pro

Most highly trained data science professionals call themselves a data scientist or similar. The job is to take large amounts of data and transform that into insights on which a business or organisation can take useful action. A data scientist is an extremely important addition to a company. This professional provides the information needed for a business or organisation to make decisions.

Data scientists are employed across many industries, including large companies and government agencies. There is huge demand for these professionals. So you should find the job search process less challenging than in many other careers, especially if you are even more highly skilled than your competitors.

As a data scientist, you examine data to achieve insights and present these insights to other professionals. This must be done in a way that people without a technical background can understand. Data scientists need to have skills in areas such as computer science, analytics, statistics, modelling and maths. Depending on your organisation and its goals, you may also need a reasonable or high degree of business knowledge and sense.

The position of data scientist is usually ranked a bit higher than that of data analyst. For example, a data scientist may create a complex data model that a data analyst may then use on a daily basis to produce business reports. Data scientists are usually fluent in programming languages such as SQL, Python and R.

A data scientist designs processes for data modelling. These processes are needed to create predictive models and algorithms, as well as custom analysis. This professional must work with business stakeholders and reach conclusions about how data should be utilised to reach objectives and goals.

Job titles : data science lead, data scientist (advanced analytics), data science lead – digital platforms, data science manager, data scientist, data scientist – government, data scientist – machine learning / computer vision / NLP, data scientist (computer vision and deep learning), data scientist (maintenance), data scientist / analyst, data scientist / engineer, head of data science, junior data scientist, junior quantitative researcher, machine learning data scientist, principal data scientist, quantitative researcher, research assistant, research coordinator, research development coordinator, research fellow, senior analyst – data scientist, senior data consultant, senior data scientist, senior manager – data science.

2. Data Analyst

Data analysts

As a data analyst, your responsibilities include not only the analysis of data but its interpretation as well. This combination of skills makes you indispensable to organisations in their decision-making processes. Employers hire data analysts to find new opportunities for increasing revenue and driving down costs.

In the data collection and analysis process, data analysts utilise specific methods. They collect statistics and transform them into information that a business can readily understand and harness for its benefit. Data analysts report their findings to businesses. It’s common for data analysts in Australia to make between $100K and $120K annually.

A few of the data analyst’s job duties include tasks such as:

To become a data analyst, you should get a bachelor’s degree in a data analytics or a related subject such as data science or big data management. Some employers like to see a master’s degree too.

Once you’ve done your degree, think about doing an internship. An entry-level job (for example, as a statistical assistant or technician) is also an effective way to get a foot in the door. Data analysts usually know Microsoft Excel, SQL, Tableau and Python.

Job titles : data analyst, data analyst – energy fleet analysis, data analyst / junior data scientist, data analytics consultant, data analytics project manager, data and analytics senior analyst, data infrastructure analyst, data quality analyst, economic data analyst, junior data analyst / scientist, lead data analyst, problem solver — risk analytics, research and data analytics advisor, security intelligence analyst, senior data analyst, senior data insights analyst, senior auditor (data analytics), senior data analyst / scientist, senior data and insights analyst.

3. Data Manager

Data security management

Data managers must have a much greater awareness of the business side of things than data scientists. They are key to the achievement of important business goals, and they’re responsible for data flow, processes, and even people coordination wherever relevant.

An effective data manager must be knowledgeable in areas such as:

A data manager is responsible for the data of a domain, or perhaps of an entire department or enterprise. You must ensure data integrity throughout the lifecycle, making sure that people who need to use the data can access it in an efficient way.

Job titles : customer success manager — data centre, data centre facility manager, data centre management, data centre operations manager, data insight manager, data project manager, database administrator, data centre operations manager, data engineering manager, data insight manager, data manager, data project manager, investment data manager — analytics, manager – climate data science, manager — data management and delivery, product data manager, manager – data management and delivery,  manager — data modernisation, product data manager, programmatic trader, reporting and data manager, senior data manager — informatics and data quality, senior manager – data governance, senior management (data governance), spatial data officer.

4. Data Architect

Data architecture diagram

Data architects are the professionals specifically responsible for the design, implementation, and management of an organisation’s data architecture. The position of data architect is more senior than some other career tracks in data science. Entry-level jobs almost never have this job title. Getting a master’s degree in data science or computer science is an excellent idea if becoming a data architect is your ultimate goal.

A career path here is to first get a bachelor’s degree and usually at least between three and five years of experience. Start your career in database administration or programming and then continue strengthening your skills in data warehousing, data modelling, data management, data development, and database design.

Data architects work in industries such as education, finance, insurance, and business. Two of the most significant employers of data architects are software companies and technology manufacturers. These professionals are needed in organisations that deal with enormous quantities of client data.

Job titles : big data architect, cloud solution architect – data science AI, data warehouse architect, data warehouse / business intelligence architect, digital solution architect, information architect, IT / data architect, IT solutions architect, lead solutions architect, senior data architect, senior services architect, solution architect, solution architect – data and analytics, solution architect – PEGA, senior architect – data and integration, senior data solutions architect.

5. Data Engineer

Data engineering program

Data engineers work at a more fundamental level than data scientists. In other words, these professionals are the ones who work with the data in its rawer form. It is the work of the data engineer that makes data ready for data scientists to do additional processing. Data engineers may be proficient in several programming languages such as SQL, NoSQL, Apache Spark and Hadoop, as well as Python, R, Java and C++.

A data engineer must work with raw data that has machine, instrument, or human errors. As a data engineer, you work with data that may have problematic records or not be properly validated. It is more challenging to work with because it is unformatted and will have codes that are specific to particular systems.

Data engineers are masters in the field of data science, able to create innovative methods for storing and accessing enormous collections of data. These tech professionals design and create data architecture and tools. They must test them thoroughly in the process. The tools that data engineers build are intended to make the interpretation of a business’s data easier to accomplish. Data engineers are well-paid, making on average between $120K and $140K annually in Australia.

Data engineers construct data architecture. They’re instrumental in the maintenance of these elements. Another duty will be analysing and interpreting enormous sets of data. To be a data engineer, you need an exceptionally advanced understanding of many different data analysis tools and programming languages. Data engineering jobs often work for technology companies and the IT departments of businesses and other organisations.

Data engineers develop architecture, constructing it as well as testing and maintaining it. This architecture can include large-scale processing systems and databases. The data scientist’s duties are different in that he or she is the person responsible for cleaning and organising data.

One of the data engineer’s most important goals is that of improving the efficiency of the business, thus helping it more effectively accomplish its goals. Data engineers test and launch especially advanced tools for data analysis, as well as techniques including machine learning and algorithms.

Job titles : data engineer, data engineer consultant, data engineer – data warehouse, data engineer – machine learning, data engineer – processing and analytics, data science engineer, data engineering manager, junior data engineer, junior integration engineer, lead data engineer, platform engineer – data, senior data engineer.

6. Business Analyst

Business analyst's presentation

A business analyst examines and analyses business processes. This professional finds efficiencies and takes on a leadership position when it comes to project teams. The business analyst provides necessary technical information for the business.

Information technology is the most common sector where business analysts are found. Business analysts also work in a range of other business departments. Some of the most common duties and tasks include:

To become a business analyst, you need a bachelor’s degree in a field such as information systems finance, business administration or another closely related discipline. You can also get a master’s in Business Analytics, Business Administration or Information Systems to make yourself more competitive in the job search.

Job titles : analyst customer communication insights, analyst – primary market research, business analyst, business analyst – marketing, business consultant principal (data management), business intelligence analyst, business intelligence (BI) & data warehouse developer, business intelligence specialist,  customer segment analyst, customer strategy specialist, customer success manager – data centre, insights analyst, insights consultant – data scientist, junior business analyst, performance and quality advisor.

7. Software Engineer

Software engineer programming

Software engineers differ from data scientists in that their territory centres much more on end-user functionality, as well as application development and feature creation. Their focus is designing and developing software systems. Software engineers are also instrumental in the maintenance of these systems.

Software engineers create applications that generate data that may be used by data scientists. Both professions require strong programming skills.

The types of systems you’d work on can vary widely, encompassing everything from simple applications to intricate online platforms. Software engineers usually play a role in every phase of software development. After release of a product, the software engineer will frequently be responsible for maintenance.

Software engineer salaries in Australia can vary significantly between different cities and regions. The average salary for software engineers in Australia is about $95,000. Software engineers in certain cities can make even higher salaries.

Job titles : associate software engineer, backend software engineer, embedded software engineer,  frontend software engineer, graduate software engineer, graduate software engineer – prototype development, junior back end software engineer, junior developer, junior software engineer, lead software engineer, lead software engineer – platform, PHP software engineer, senior software engineer, software engineer, software engineer specialist, software engineer – site reliability engineering, software integration engineer, software quality assurance engineer, software developer, software development engineer intern, software development internship.

8. Machine Learning Engineer

Machine learning engineering

To be a machine learning engineer, you need both data science and software engineering expertise. The objectives and goals of a machine learning engineer are different than those of a data scientist.

A machine learning engineer creates working software. This is different than data scientists and their objective of visualisations and analysis. Just a few of the skills you need as a machine learning engineer include statistics and probability; data evaluation and modelling; system design and software engineering; computer science and programming; and application of machine learning algorithms.

As a machine learning engineer, you’ll develop AI (artificial intelligence) systems and machines. These systems and machines not only learn but apply their knowledge. To do this, you must be highly skilled with sophisticated algorithms and data sets.

Job titles : computer vision engineer – machine learning image processing, machine learning engineer, machine learning solutions lead, machine learning team lead.

9. Statistician

Statistical chart

A statistician differs from a data scientist in that he or she focuses only on statistics rather than on all the other disciplines that are part of data science. To be a statistician, you need a university degree (or more than one degree) in statistics or mathematics.

As a statistician, you’ll establish and utilise statistical techniques and theory for the collection, analysis and interpretation of numerical data. This is essential for reaching decisions and creating policy in an organisation. Some of the fields and industries in which you may find work as a statistician include, for example, business, medicine, government, science, and education.

Statisticians in Australia can make as much as between $120K and $140K annually.

Job titles : senior biostatistician, senior clinical statistician, senior statistician, statistician, trainee biostatistician.

10. Data Modeller

Databases

The work of the data modeller is essential for data scientists to able to do their work. Data modellers build the blueprints for databases. These databases are the storage places for the data used by data scientists.

Like data scientists, data modellers are essential for a business to gain useful information from raw data and then use this information for business decisions. Job responsibilities include:

You need a bachelor’s degree if you want a career as a data modeller. The average salary for data modellers in Australia is $108,000.

Job titles : data modeller, data modeller / data analyst, credit risk modeller, solution designer / data modeller, modelling geologist, senior credit risk modeller.

11. Freelance Data Scientist

Freelance data scientist

Going freelance as a data scientist is an increasingly popular option. If you decide to go this route, be aware that you must be almost as skilled with practical business concerns as you are with data science. Remember that you’ll have many administrative tasks that need to be done.

To be successful as a freelance data scientist, you must become extremely adept at finding clients. Securing freelance work in data science can be more challenging than in other fields. When you apply for work with an organisation, be aware that only a few individuals within it will be in the position to hire you. This includes the software engineering manager, the CTO, and the CEO, as well as the head of the department where an important project is being completed.

One of the most effective steps you can take to become a successful freelance data scientist is to focus on a specific niche. Whatever niche you choose, it will centre on a single industry and a particular area of data science. It’s wise to select a specialised area within data science with which you’re especially highly skilled and experienced. When choosing an industry, you should consider a range of factors including, for example, demand for freelancers and your own personal interest.

Choosing a niche will make it easier for you to a establish marketing plan that will work for you. Remember that when an organisation hires a data scientist, it is seeking to solve a problem. You must show that you’re the best person to do this.

12. Clinical Data Manager

Clinical patient data

A career as a clinical data manager is the perfect way to combine experience and expertise that you have in both the IT and health care arenas. As a clinical data manager, you deal with every part of the collection and dissemination of data.

You’ll probably have a leadership role in decision-making, when determining the methods that will be used for data collection. Project management and a variety of technical duties will be significant parts of your job.

To become a clinical data manager, you need education and expertise in both technology and the scientific research or healthcare industries.

Job titles : clinical research assistant, clinical research project lead / senior clinical research coordinator, clinical trial data coordinator, clinical trial coordinator / senior clinical trial coordinator, senior clinical research associate, senior clinical trials coordinator.

13. Marketing Analyst

Marketing analyst

Marketing analysts require strong software and statistics skills. But they are much more business-focused than data scientists. A marketing analyst provides marketing insights into a business’s products and services. This professional’s work is important in helping the business reach its marketing goals.

As a marketing analyst, you analyse data, create a marketing plan, and offer a variety of solutions for clients to improve marketing campaigns. Marketing analysts establish metrics and create strategies for performance testing and improvement.

You’ll usually need a degree in marketing, business, or a related field for a career as a marketing analyst. Knowledge and skill in the use of reporting software and business intelligence are additional requirements.

Job titles : digital marketing insights analyst, marketing analyst, marketing and business analyst, marketing automation analyst, senior marketing analyst.

Education for Data Science

If you decide to pursue a career in data science, the first step is to study programming and linear algebra. You must have strength in these areas to start and excel in data science studies. Probability and statistics are two other essential areas. Data scientists often have bachelor degrees in technology, programming or mathematics fields.

Once you have competency in these areas, you should enrol in a data science program. The main options for data science education and training are a Graduate Certificate in Data Science, Graduate Diploma in Data Science or a full Master of Data Science. Employers may prefer job applicants with a Master of Data Science degree.

Once you're finished your formal data science education, you can consider launching your career with an internship in the field. If internships are hard to find or secure, you can volunteer instead to get the job experience. The idea is just to get started with applied learning, making connections and working your way up the career ladder.

Keep a portfolio of completed projects that you can show to potential employers and clients. This can include practice problems and university assignments. It's best to keep a GitHub account for this purpose. If you feel your portfolio is a little on a sparse side, ask family and friends if they'd like you to do some projects for them. It can also be helpful to join data science communities online.

You can enter data science competitions as well. You can find such competitions online. Taking part in these competitions will give you the chance to practice your skills, learn from other people and even connect with potential employers.

Graduate Certificate in Data Science

Studying data science online

Data science is a field you can break into by completing a postgraduate course. A popular pathway is to do an online graduate certificate, at least as a starting point. The courses are available 100% online from Australian universities.

University of Adelaide

Graduate Certificate in Data Science (Applied)

Recommended by Mallory

James Cook University

Graduate Certificate of Data Science

UNE

Graduate Certificate in Data and Cyber Management

Graduate diploma in data science.

Internet of things

In terms of length, a graduate diploma in data science sits about halfway between a graduate certificate course and a masters degree. Graduate diploma courses allow you to build a solid skills base or specialise in a particular field.

Graduate Diploma of Data Science (Internet of Things)

Internships in data science.

Doing an internship in data science can be a great way to launch your career in the field. You could, for example, gain a research assistant position or a software engineering internship. To secure your place, put together a portfolio project to showcase data science skills.

If you’ve done an undergraduate degree in data science, you may have already done such a project at university. Make sure that you put your project on GitHub. Also ensure that you properly research all companies you apply to for an internship.

If a company you apply to indicates that they’re not looking for interns at the moment, politely ask if they could keep in contact and let you know if any opportunities arise. If possible, try to find referrals already within the company too. When a vacancy arises, management often first ask around within the company to see if anyone knows someone qualified who they can recommend.

Data Science Masters Degrees

Data science courses

Online masters degrees in data science provide an excellent platform for a long and successful career in the field. The programs are designed to be convenient and compatible with full-time work.

Master of Data Science (Applied)

Master of data science, mba (data & cyber management).

Resources : Information Technology Jobs and Descriptions

Please note that Internet Explorer version 8.x is not supported as of January 1, 2016. Please refer to this support page for more information.

Elsevier

Research in Organizational Behavior

Reprint of: to thrive or not to thrive: pathways for sustaining thriving at work ☆.

Thriving, the psychological experience of both vitality (or energy) and learning, is often elusive. Rather than growing, developing, and feeling energized, workers report stagnation and depletion. While much of the research on thriving at work has focused on what managers can do to promote thriving amongst workers, we highlight the means by which people are empowered to take control of their well-being. Workers can sustain their own thriving through three pathways: (1) by engaging in self-care, (2) creating and maintaining high quality relationships, and (3) building community within and outside the organization. We show that these three pathways are particularly important given the changing nature of more temporary and flexible work arrangements, increases in remote work, and the larger need for community embeddedness to address the many grand societal challenges that confront us.

Data availability

No data was used for the research described in the article.

Cited by (0)

This article is a reprint of a previously published article. For citation purposes, please use the original publication details: Research in Organizational Behavior 42, 100176.

War in Ukraine prompts shifts in thinking about international cooperation in science

One year on, and as Russia continues waging war on Ukraine, the research community is holding its breath to see how geopolitical fractures will impact global science cooperation

research work in data science

European Commission President Ursula von der Leyen (left) with Ukrainian President Volodymyr Zelenskyy (centre) and President of the European Council Charles Michel. The EU quickly moved to support Ukraine after Russia’s wholescale invasion in February 2022, but the future of Ukraine’s research community remains uncertain. Photo: Dati Bendo / European Union

A year ago, Russia’s full-scale invasion of Ukraine redefined geopolitics in a shockwave that is still reverberating through the science world. The EU research community was quick to cut ties with Russia and lend Ukraine a helping hand – but now it is grappling with resulting instability and uncertainty as the war climbs into its second year.

Lucian Brujan, programme director for international relations and science diplomacy at the German National Academy of Sciences Leopoldina, says it’s too early to say what the long-term impact will be on research and innovation - and urges patience.

“I think many in the community are waiting to see how the political problems will be solved and how this war will end; and after that, we’ll need to have a discussion,” Brujan says. “We have to be honest with ourselves in the scientific community. We are dealing with political and security uncertainty.”

But what is clear already is the shift in discourse on international cooperation. While it’s hard to judge the effect in hard terms, conversation has shifted away from blanket arguments in favour of openness, towards a more careful attitude, observes Thomas Jørgensen, director for policy coordination and foresight at the European University Association.

“Before the war in Ukraine, even if we found that we have hardened transatlantic blocks, we would still argue for openness and do under the radar science diplomacy,” says Jörgensen. “We will see later if we’ve seen less co-authorships with countries that are not like-minded, but it has certainly changed the way we talk about these things. The Ukraine war gives good arguments for those that are more diligent and technology sovereignty oriented.”

Some frame the shift as a loss of innocence. Russia’s invasion was an attack on EU’s fundamental values, and it shocked many. Before the war, the EU’s stance on science cooperation rested on the ‘as open as possible and as closed as necessary’ principle, and while this remains the rule, one former diplomat says, “We can see a certain shift towards the second part of this principle.”

Eyes on China

As global tensions intensify, eyes turn to China, which in recent weeks has been deliberating sending weapons to Russia.

That the EU has a complex relationship with China isn’t new, but the tense geopolitical situation and China’s equivocation on the war in Ukraine, has added has a new dimension to it, says Lidia Borrell-Damian, secretary general of Science Europe.

The complexity isn’t just about big politics but “comes from different legislative approaches in the EU and China regarding open access, open science, treatment of data, the outcomes of research. The difficulties of research collaboration with China have been there for years now,” adds Borrell-Damian.

The EU cut off all research ties with Russia as the war broke out, and now the big question is whether it was a one-off extreme measure, or a realisation about the complexity of the world we live in that will lead to reappraisal of research ties with others, including China.

Jörgensen believes it was the latter, and the tone is changing. “In Brussels, there are those that have started talking about a more transactional approach: arguing we should only work with China if there is a clear benefit,” says Jörgensen.

It's not about cutting ties but being smart about cooperation. The geopolitical shift comes at a time when Europe needs to strengthen its global standing, including in research and innovation. It cannot go at it alone, but the recipe for ‘smart’ cooperation is yet to be concocted. “We have this global competition of systems, and in this changed world science needs to position itself in a new way to a certain extent,” one diplomat says. “The overall goal should be smart cooperation, not less cooperation.”

Right now, there is a waiting game to see which direction China will move in, whether it will help Russia’s war effort or not. In the meanwhile, governments are thinking strategy. Germany is to set out a new China strategy by summer, which will reference aims to become more strategically independent and diversify supply chains.

Solidarity with Ukraine

While tensions rise in the east, Europe’s research community has been stepping up its support for Ukraine, which has seen many of its universities and research institutions destroyed, and researchers and academics displaced.

In Europe, universities and research organisations welcomed refugee scientists, mobilised grant support, set up collaborative initiatives and helped fast-track Ukraine’s engagement in EU research frameworks. “I think that the European research community has been exemplary in supporting research in Ukraine,” says Borrell-Damian. “If we look at all these actions from a science diplomacy angle, it’s a good outcome. We can’t stop the war, but if we look at what we have done from a science diplomacy angle, we have indeed taken a stance.”

Jörgensen notes the knowhow is there from previous crises, as he praises the unprecedented response. “It builds on the solidarity of the many experiences of universities when we had an influx of refugees in 2015, in particular from Syria. There’s a knowhow and it’s been put to good use to help Ukrainian scientists,” he tells Science|Business.

But the big danger now is brain drain, and Ukrainians are acutely aware of it. To deal with this, Brujan says it’s important to keep research and education at the top of priorities when the country is rebuilding, and to keep those that fled Ukraine engaged. But these are future worries: up to now, the EU’s support has largely focused on immediate relief for the country’s scientists, and long term planning remains nearly impossible in war time. “In a war situation, it’s not about big research but surviving day to day,” Brujan says.

Once the war is over, he adds, it will be important to involve Ukrainians in the discussions on the support they receive and their involvement in the European Research Area.

The Ukrainian research community is ready to put in the work. “They rise up to the challenge,” says Borrell-Damian.

Researchers confused

For researchers on the ground, the shift in geopolitics raises practical questions. In France, researchers are inquiring how it will affect their work and host institutions are raising awareness about the terms on which scientific collaboration can and cannot continue.

For laboratories that deal with sensitive issues, there’s a dedicated framework. “They ask us to provide guidelines – they know that there are issues and risks that come with their work, in particular in specific scientific domains,” a source in the French government says.

At the same time, foreign interference has become a focal point of discussions around cooperation. “It was there before, but in this new geopolitical context it’s an even more pressing issue. We have to talk about it when we talk about cooperation,” they add.

Overall, Brujan notes, “The war has shown one clear trend: scientific organisations and even scientists are way more careful than they have been before. Various aspects are being reconsidered, from security to fundamentals to practicalities. The scientific community will see how this prudence is going to affect cooperation globally.

This is a normal reaction to navigating an unstable environment, he adds. We need to see how this prudence is going to affect scientific cooperation. It doesn’t mean we have to give up our way of doing: my message is to have patience and observe sharply what’s going on.”

The important thing now is to keep discussions going, at all levels. Talks under the European Research Area (ERA) framework have been fruitful, but “what we don’t know, and where there can be a mismatch, is between high-level diplomatic geopolitics discussions and research policy discussions,” said Jörgensen.

For this discussion to happen, scientists and policymakers will need to learn to speak each other’s languages, and take a more practical approach, Brujan notes.

Never miss an update from Science|Business:   Newsletter sign-up

Related News

research work in data science

The unique forum convening public and private sector leaders for networking, intelligence and debates on research and innovation.

SB Network

Network Updates

These updates are republished press releases and communications from members of the Science|Business Network

Follow us on Twitter

Get the free science|business newsletter.

newsletter icon

  Sign up for the Bulletin

  Sign up for The Widening

survey

The Horizon Papers: 2022 Edition

Read all Horizon papers

Climate news

Horizon leaks

Read our expanding news coverage of climate technologies and policies   here . 

Scien Business Podcast

LATEST: Research rEVALution

IMAGES

  1. Data Analytics Life cycle. ในการทำ Data Analysis นั้นจะมี Process…

    research work in data science

  2. Data Science Learning Roadmap for 2021

    research work in data science

  3. Steps involved in a typical data science workflow.

    research work in data science

  4. Machine learning algorithms

    research work in data science

  5. Data Science Research Methods

    research work in data science

  6. Top Career Opportunities in Data Science in 2022 [Updated]

    research work in data science

VIDEO

  1. Ingenious skills and Design 👍

  2. data science interview questions and answers #datascience #youtubeshorts

  3. Key Subjects to become Data Scientist

  4. Future Human 2022

  5. Data Science

  6. What Even is Data Science?!: A Roundtable Discussion

COMMENTS

  1. Research Areas

    Stanford Data Science is a collaborative effort across many departments in all seven schools. We strive to unite existing data science research initiatives and create interdisciplinary collaborations, connecting the data science and related methodologists with disciplines that are being transformed by data science and computation.

  2. Research: Data Science Program: Indiana University

    How to join a research project. To work with a faculty member on their research, or to complete an independent study with a particular faculty member, we encourage you to contact that faculty member directly. If you have additional questions about research at data science, you can contact Haixu Tang, director of the data science graduate program.

  3. Ten Research Challenge Areas in Data Science

    Data science is a field of study: one can get a degree in data science, get a job as a data scientist, and get funded to do data science research. But is data science a discipline, or will it evolve to be one, distinct from other disciplines? Here are a few meta-questions about data science as a discipline.

  4. Data Science Researchers

    Data science can significantly benefit multiple domains of engineering mechanics, particularly with respect to modeling and simulation. ... My most recent work in Data Sciences has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked ...

  5. Research

    DSI-affiliated researchers work in a wide range of disciplines - from business to medicine, social work to literature, history to natural science - and collaborate in interdisciplinary teams to gather and interpret data and address urgent problems facing our society. Contact Clifford Stein Data Science Institute Interim Director

  6. A Guide to Data Science Research Projects

    in Pipeline: A Data Engineering Resource 3 Data Science Projects That Got Me 12 Interviews. And 1 That Got Me in Trouble. Zach Quinn in Pipeline: A Data Engineering Resource Creating The Dashboard That Got Me A Data Analyst Job Offer Terence Shin All Machine Learning Algorithms You Should Know for 2023 Moklesur Rahman

  7. What Does a Data Scientist Do?

    While each project is different, the process for gathering and analyzing data generally follows the below path: 1. Ask the right questions to begin the discovery process 2. Acquire data 3. Process and clean the data 4. Integrate and store data 5. Initial data investigation and exploratory data analysis 6.

  8. 37 Research Topics In Data Science To Stay On Top Of

    37 Research Topics in Data Science 1.) Predictive modeling Predictive modeling is a significant portion of data science and a topic you must be aware of. Simply put, it is the process of using historical data to build models that can predict future outcomes.

  9. Top 20 Latest Research Problems in Big Data and Data Science

    The research problems to handle noise and uncertainty in the data:- 4. Identify fake news in near real-time: This is a very pressing issue to handle the fake news in real-time and at scale as the fake news spread like a virus in a bursty way. The data may come from Twitter or fake URLs or WhatsApp.

  10. Research Data Scientist Jobs, Employment

    Data Scientist. Federal Bureau of Investigation 4.3. Washington, DC 20535 (Penn Quarter area) Pennsylvania Ave NW + 10th St NW. Estimated $97.3K - $123K a year. Developing and utilizing existing data systems to acquire and prepare data for research purposes, exploration, or statistical analysis. Posted 30+ days ago ·.

  11. Data Science

    About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions.

  12. Research Fellow, Statistics and Data Science job with NATIONAL

    Job Description. The successful candidate will work with Dr. Michael Choi and co-supervised by Dr. Wenqian Chen on stochastic algorithms with applications in bioinformatics and protein folding under a project on MAPLE: Mechanistic Accelerated Prediction of Protein Secondary Structure via LangEvin Monte Carlo, supported by a Ministry of Education Tier 1 grant under the Data for Science and ...

  13. The Role of Data Science in Research

    The essential Data Science techniques researchers need to know about To build data science capabilities, the first step is to upskill researchers and subject-matter experts in the foundations of Data Science using Python. Widely-used techniques to start learning are: Data Science Essentials Working with Jupyter notebooks

  14. Top 20 Data Science Research Topics and Areas For the 2020-2030 Decade

    Top 20 Data Science Research Topics and Areas For the 2020-2030 Decade April 2020 Authors: Joab O. Odhiambo University of Nairobi Stanley Sewe Abstract In this decade, Data science seems to...

  15. 11 Data Science Careers That Are Shaping the Future

    Data science experts are needed in virtually every job sector—not just in technology. In fact, the five biggest tech companies—Google, Amazon, Apple, Microsoft, and Facebook—only employ one-half of one percent of U.S. employees. However—in order to break into these high-paying, in-demand roles—an advanced education is generally required.

  16. Ten Research Challenge Areas in Data Science

    In this article we enumerate 10 areas of research in which to make progress to advance the field of data science. Our goal is to start a discussion on what could constitute a basis for a research agenda in data science, while recognizing that the field of data science is still evolving.

  17. data science Latest Research Papers

    Data Science . Information Use . Regulatory Compliance . Future Research . Public And Private . Social Good . Public And Private Sector . Effective Use. AbstractThe appetite for effective use of information assets has been steadily rising in both public and private sector organisations.

  18. Your Guide to Data Science Careers (+ How to Get Started)

    A career in data science is not limited to technical knowledge. You'll work on teams with other engineers, developers, coders, analysts, and business managers. These workplace skills will help take you farther: Communication skills Storytelling Critical thinking and logic Business acumen Data science job outlook

  19. What is Data Science?

    Data science is the scientific study of data to gain knowledge. This field combines multiple disciplines to extract knowledge from massive datasets for the purpose of making informed decisions and predictions. Data scientists, data analysts, data architects, data engineers, statisticians, database administrators, and business analysts all work ...

  20. Computer and Data Science

    Rice University is now accepting applications for a 10-week summer undergraduate research program in the general area of Computer and Data Science, generously funded by a gift from Google, LLC. Program participants will be assigned to a Rice faculty mentor and will work closely with a Rice graduate student or Postdoctoral researcher to perform ...

  21. 12th and 13th International Meeting on Visualizing Biological Data

    This Research Topic collects work presented during the 12th International Meeting on Visualizing Biological Data (VIZBI 2022) and the 13th International Meeting on Visualizing Biological Data (VIZBI 2023) conferences. The conferences feature talks from 21 world-leading researchers showcasing visualizations transforming how life scientists view data, and driving key advances in molecular ...

  22. 13 Data Science Careers That Are Exploding Now

    After asking questions related to a fundamental problem, data scientists will work with raw data, collecting, organising and analysing it. They create and use algorithms for the identification of patterns and trends in the work of answering questions.

  23. Study on zinc oxide‐creatinine hybrid catalyst for efficient lactide

    Funding information: National Natural Science Foundation of China, Grant/Award Number: U20A20148; Tianjin Science and Technology Committee Development Project, Grant/Award Number: 21ZYQCSY00050; Liaoning Yingkou Science and Technology Program, Grant/Award Number: 2020103; Foundation of Tianjin Key Laboratory of Brine Chemical Engineering and Resource Eco-utilization, Grant/Award Number ...

  24. Reprint of: To thrive or not to thrive: Pathways for sustaining

    For example, a recent meta-analysis of 65 articles examining thriving at work concluded that this research "supports Spreitzer and colleagues' model and underscores the importance of thriving in the work context" (Kleine et al., 2019).They found that individual characteristics such as psychological capital, proactive personality, more general positive affect, and work engagement were ...

  25. War in Ukraine prompts shifts in thinking about international

    A year ago, Russia's full-scale invasion of Ukraine redefined geopolitics in a shockwave that is still reverberating through the science world. The EU research community was quick to cut ties with Russia and lend Ukraine a helping hand - but now it is grappling with resulting instability and uncertainty as the war climbs into its second year.

  26. Cara Joos

    I am a passionate data scientist interested in using my skills to answer complex questions with appropriate data. Design, plan and implement programs that address stakeholder research needs. <br ...