Computer Scientist

Computer Scientist

Job Profile

As a Computational Scientist, you will – (i) build computational models and simulation models and (ii) develop and write algorithms using advanced mathematical and computational techniques that can be applied to solve complex scientific, technological, and engineering problems.

Computational Scientists are called in when studying a natural system (physical, chemical, biological, environmental, etc.) or solving a complex scientific problem (such as out of 2,000 probable genes which one is responsible for a certain disease) require dealing with massive amount of data throughput – or putting it simply, a massive amount of data inputs and analysis. This kind of data analysis cannot be done by simple computing that we do using data analytic software on our computers but requires high-performance computing (HPC) or high-throughput computing (HTC).  

This is high-end science and unless you are a computer geek, you might have already found it slightly difficult to understand what is being written.

Firstly, understand a complex natural system that Computational Scientists deal with

Take, for example, daily weather forecasting. This involves analysis of a humongous amount of data obtained from satellite images, remote sensing data, and data from various other weather observation tools. Understand that these data flow is continuous, and, in every minute, thousands of Gigabytes of data needs to be analyzed. Weather depends on a very large number of factors – data on which need to be analyzed to forecast weather.

So, this is a complex system and forecasting is a complex task. Computational Scientists pitch in by using complex mathematical techniques to build computational models and write algorithms, which, when run on high-performance computing systems (which can carry out a massive amount of computations or calculation simultaneously), can accurately forecast daily weather.

What is a computational model? And what is an algorithm?

A computational model comprises one or more mathematical formulas and functions to enumerate or describe how, given a set of input data, output information will be computed (or calculated). Remember the basic algebraic formula, (a+b)2 = a2 + 2ab + b2?Well, computational models will have similar but much more complex formula for computation of outputs from a given set of input data.

An algorithm is a sequential set of instructions that could be executed by a computer. An algorithm is a set of well-defined specifications, arranged sequentially, that can be used to write a computer program for performing a task, calculation, data processing, data analysis, and so on.

Secondly, understand a complex problem that Computational Scientists deal with

Take, for example, identifying the gene responsible for a specific disease. This task may require sequencing 2,000 probable genes and finding the exact sequence or gene which might be responsible for producing a protein that in turn may trigger a disease.

Sequencing genes means laying out the sequence of the nitrogenous bases (Adenine, Cytosine, Guanine, and Thymine – also called nucleotide bases) in the strands of DNA, right? Basic Biology. DNA has a double helix structure made up of two spiral chains or strands of deoxyribonucleic acid. These two strands are held together by the nucleotide bases bonded in pairs (Adenine or A bonds with Thymine or T and Guanine or G bonds with Cytosine of C. The outer sides of the strands are made up of deoxyribose sugar and phosphate.

Now, in each of the strands, the nucleotide bases appear in a sequence. A gene is a specific sequence of the bases which can produce a protein. If the sequence of the bases is laid out on paper, it may look like – GATTGTACATGT and so on. It could be a very long sequence.  Now, multiply this with 2,000 for the example task in hand.

For processing such mega volume of data, Computational Scientists are called in. They build computational models and write algorithms which can carry out the sequencing tasks in the fastest possible time using high-performance computing.

Getting the idea?

An idea of the volume of high-throughput data that Computational Scientists might be dealing with

Let us stick to the gene sequencing task for giving you an example.To give you a basic idea, around 2,50,000 human gene sequence data that are available to the scientist now could be equal to about 25 petabytes (YouTube generates about 100 petabytes of data annually now).

1 petabyte = 1000 TB or terabyte

1 TB or terabyte = 1000 Gigabyte

Scientists are predicting that within the next decade, the amount of human gene sequence data will be approximately between 10-40 exabytes a year.

1 exabyte = 1000 petabyte

Now you can imagine that processing this large volume of data or even a fraction of this large volume of data requires complex computational models and high-end computer processing power.

So, what will you do as a Computational Scientist?

First,

You will study and understand the system or environment or the conceptual framework to which a complex task or complex problem belongs to. For example, climate conditions and weather conditions in case of the weather forecasting problem. Molecular Biology and Genetics for taking on the gene sequencing task. Or understanding the Physics or Chemistry or the technology behind a scientific or technical problem.

Second,

Understand the task and the problems, or you will frame or conceptualize the task and problem yourself.

Third,

Develop a computational model; or in some cases, a simulation model using complex mathematics and computer science concepts.

Simulation means approximate imitation of a real-life system or process – simulation is nowadays used when the real system is too complex and takes a lot of time to observe to get data or when the real system could be too dangerous to engage or when a system is being built and many forms of the systems need to be tested – for example, chemical analysis of a very large number of compounds to identify a molecule which may treat a drug; another example – simulation of car crashes to understand what safety features may be useful; another example – simulation of living conditions in Mars to design the right kind of bodysuit).

Fourth,

Write the required algorithms for computers to execute the computational or simulation models.

Fifth,

Decide upon the right computing processing power (such as high-performance computing, high throughput computing, distributed and parallel computing, etc.). Heard of Super Computers, right? Super Computers have high-performance computing powers. Distributed and parallel computing engages a very large number of processors simultaneously.

Sixth,

Analyze the outputs from the computations and validate the models.

Remember that

Computational Scientists are not Computer Scientists or Computer Engineers. A Computer Scientist/ Computer Engineer may find work in Computational Science. However, with a good number of years of experience in Computational Science, a Computer Scientist/ Computer Engineer may call herself a Computational Scientist.

The fundamental difference is that a Computer Scientist and Engineer are involved in designing, developing, installation, testing, and maintenance of computer hardware and software. They may sometime use basic computational techniques in software development. But then, they will not be able to do the high computational modeling and algorithm development that a Computational Scientist can do.

So, Computational Scientists are not involved in hardware development and software development. They may do programming a lot, but their primary purpose is to build computational models, simulation models, and algorithms for solving complex scientific, engineering, and other problems.

Key Roles and Responsibilities

As a Computational Scientist you will be responsible for one or more of the following roles or associated tasks:

  1. You will analyze and interpret high-throughput data. High-throughput technology means automation of experiments such that large scale repetition becomes feasible i.e. any technology/instrument that generates large data-sets with more than 10000 data points by performing repetitive tasks and enables you to work on it directly.
  2. You will study and understanding relevant datasets, both those generated internally in your organization and those from the public domain.
  3. You will select, design & oversee appropriate in vitro(synthetic / chemical medium) and in vivo(biological medium) studies to support model development and validate computational predictions.
  4. You will develop and apply novel computational techniques for interpretation of high-throughput biological/ physical/ chemical/ technological/ engineering data.
  5. You will develop algorithms and computational methodologies applicable to high-performance computation.
  6. You will contribute subject matter expertise to software development teams for supporting the creation of machine learning algorithms & advanced analytical methodologies required to provide high-confidence predictive information.

You may have to use your programming skills as well using popular scripting languages such as Fortran, C/C++, Python, JAVA / Scala, Mathematica, R with analytical or scientific software relevant to your industry such as SAS, BioPerl, ClustalW, ENSEMBL, GenBank, GenePattern, Illumina LIMS, SOLiD, Vector NTI, NCBI RefSeq, ChemStation, Minitab, CALACO, Chem 4-D, Benfield ReMetrica, SigmaStat etc. and 1 or more machine learning libraries such as sci-kit-learn, MLlib,TensorFlow, PyTorch, Keras, Caffe or Theano etc.

  1. You will employ computational tools to run analyses for various compound sets, compile and interpret results, and develop summary reports.
  2. You will develop technical documentation that includes methods, procedures and analytical data including interpretation of results and a thorough impact analysis.
  3. You will be preparing reports for communicating data analysis and modeling results to other scientists, technologists, and engineers.

Core Competencies

  • You should have interests for Investigative Occupations. Investigative occupations involve working with ideas and quite a lot of thinking, often abstract or conceptual thinking. These involve learning about facts and figures; involve the use of data analysis, assessment of situations, decision making and problem-solving.
  • You should have interests for Enterprising Occupations. Enterprising occupations involve taking initiatives, initiating actions, and planning to achieve goals, often business goals. These involve gathering resources and leading people to get things done. These require decision making, risk-taking, and action orientation.
  • You should have interests for Realistic Occupations. Realistic occupations often involve physical activities for getting things done using various tools and equipment.

Knowledge

  • You should have knowledge of Computers– Knowledge of computer hardware and software, computer programming, computer networks, computer, and mobile applications.
  • You should have knowledge of analytical or  scientific software relevant to your industry such as SAS, BioPerl, ClustalW, ENSEMBL, GenBank, GenePattern, Illumina Laboratory Information Management System LIMS, Life Technologies SOLiD, Life Technologies Vector NTI, NCBI RefSeq, Agilent ChemStation, Minitab, Vogel Scientific Software Group CALACO, ChemInnovation Software Chem 4-D, Benfield ReMetrica, Systat Software SigmaStat etc.
  • You should have experience in the broad application of 1 or more higher-level programming languages such as Python, Java/Scala, Matlab, R or C/C++.
  • You may need experience with one or more machine learning libraries such as sci-kit-learn, MLlib, TensorFlow, PyTorch, Keras, Caffe or Theano, etc. and at least one data-analysis or scripting language (e.g. The MathWorks MATLAB, Mathematica, Python, R)

Skills

  • You should have Scientific Skills - in using various scientific rules and methods to get things done or solve problems.
  • You should have Technical Skills - using various technologies and technical methods to get things done or solve problems.
  • You should have Quality Control Analysis Skills - conducting tests and inspections of products, services, or processes to evaluate quality or performance.
  • You should have experience working independently under general direction within the scope of an assignment and use sound judgment in determining methods, techniques, and evaluation criteria.
  • You should have Systems Analysis Skills - determining how a system should work and how changes in conditions, operations, or the environment will affect outcomes.
  • You should have enough verbal and written communication skills necessary to effectively collaborate in a team environment and present technical ideas/results.
  • You should have Critical Thinking skills- Skills in the analysis of complex situations, using logic and reasoning to understand the situations and take appropriate actions or make interpretations and inferences.
  • You should have Judgment and Decision Making Skills - considering pros and cons of various decision alternatives; considering costs and benefits; taking appropriate and suitable decisions.
  • You should have Problem Solving Skills - Skills in analysis and understanding of problems, evaluating various options to solve the problems and using the best option to solve the problems.
  • You may need ProgrammingSkills - writing computer programs for various applications, installation of computer programs and troubleshooting of problems in computer programs or software.
  • You should have Systems Evaluation Skills - identifying measures or indicators of system performance and the actions needed to improve or correct performance, relative to the goals of the system.

Ability

  • You should have Deductive Reasoning Ability - apply general rules and common logic to specific problems to produce answers that are logical and make sense. For example, understanding the reasons behind an event or a situation using general rules and common logic.
  • You should have Problem Sensitivity - The ability to tell when something is wrong or is likely to go wrong. It does not involve solving the problem, only recognizing there is a problem.
  • You should have Inductive Reasoning Ability - to combine pieces of information from various sources, concepts, and theories to form general rules or conclusions. For example, analyzing various events or situations to come out with a set of rules or conclusions.
  • You should have Information Ordering Ability - to arrange things or actions in a certain order or pattern according to a specific rule or set of rules (e.g., patterns of numbers, letters, words, pictures, mathematical operations).
  • You should have Oral Comprehension Ability - listen to and understand information and ideas presented through spoken words and sentences.
  • You should have Oral Expression Ability - communicate information and ideas in speaking so others will understand.
  • You should have Fluency of Ideas - The ability to come up with several ideas about a topic (the number of ideas is important, not their quality, correctness, or creativity).

Personality Traits

  • You are always imaginative or in most situations.
  • You are always or mostly care about your actions and behavior.
  • You are always or mostly disciplined in your action and behavior.
  • You are always calm or generally remain calm in most situations.
  • You can always act independently or could do so in most situations.
  • You always prefer to experience new things and have new experiences, or you mostly do.

Career Path

Example from the Field

Geetha Manjunath is an entrepreneur and Computer Scientist. She is the founder and CEO of NIRAMAI Health Analytix, a Bengaluru based start-up that provides non-invasive, radiation free breast cancer screening through AI. She has a BE in Computer Science and Engineering and a PhD in Data Mining, Semantic Web from IISc. She was awarded the Computer Society of India Gold Medal, TR Shamanna State Award from Karnataka

Download the Career Card here