Let's Talk Data
The modern world depends heavily on data. Before we can identify useful patterns or relationships within data, we first need to understand What Data Truly Is. This fundamental understanding forms the foundation for studying data science, which focuses on extracting insights from data, and machine learning, which aims to uncover hidden patterns. Both fields are increasingly important for the future.
What Is Data?
The word Data originates from the Latin word Datum, meaning 'A Thing Given.' Therefore, data literally translates to 'Things Given.'
In statistics, data refers to facts or information collected through observation or experimentation. Essentially, data means things given that can be gathered in various ways.
This raises an important question: Is all data the same? For example, is the data about students’ heights in a class the same as their names? To answer this, we broadly classify data into two major types:
- Qualitative (Categorical) Data
- Quantitative (Numerical) Data
Qualitative (Categorical) Data
Qualitative data, as the name suggests, describes qualities. These qualities cannot be measured numerically but can be categorized based on characteristics or traits.
Example: Consider the values ‘male,’ ‘female,’ and ‘non-binary.’ These are not numbers, but they still provide valuable information. When collected in large quantities, this data can reveal important insights. For simplicity, we group these values under a single category called ‘Gender.’ Referring to this category means we are referring to these three labels. These values act as labels and can be very useful when analyzed using appropriate statistical methods.
However, qualitative data raises another question: Can there be a ranking within a category? Can one value be considered more important or higher than another?
To answer this, qualitative data is further divided into two types:
- Nominal Data
- Ordinal Data
Nominal Data
Nominal data consists of purely categorical values with no order or ranking.
The gender example (male, female, non-binary) falls into this category because there is no meaningful ranking among these labels.
Ordinal Data
Ordinal data also consists of categorical values, but these have an inherent order or ranking. However, the intervals between these values are not equal or precisely known.
Example: Education level (High School, Bachelor’s, Master’s, Ph.D.). We know these labels follow an order, but, you cannot pursue a master’s degree before finishing a bachelor’s, and you cannot earn a bachelor’s without completing high school. But we cannot say precisely how large the difference or gap is between these levels.
But enough with categorical data, real-world data is often a mix of both numbers and categories. For numerical values, we have another classification: Quantitative data.
Quantitative Data
Quantitative data is numerical data that can be measured or counted. It deals with numbers and values that can be measured objectively.
Numbers themselves can be broadly divided into two types: whole numbers (integers) and decimal numbers (numbers with fractional parts). Formally, these are classified as:
- Discrete Data
- Continuous Data
Discrete Data
Discrete data consists of countable, distinct values. These values are often integers.
Example: The number of cars in a parking lot (There cannot be 4.5 cars; there are either 4 or 5 cars.)
Continuous Data
Continuous data can take any value within a range.
Example: Between 0 and 1, there are infinitely many numbers. Continuous data can take any of these values (e.g., 0.1, 0.11, 0.111, and so on).
Now that we understand what data is and its types, it’s important to also grasp How Data Is Measured or Categorized. This is where Scales of Measurement come into play.
Scales of Measurement
Scales of Measurement describe the level at which data is measured and determine what kinds of mathematical operations can be meaningfully performed on that data.
The term Scales of Measurement reflects the idea that data can be organized or measured along different levels or 'scales.' Just as a ruler has various markings to measure length precisely, scales of measurement provide levels or 'scales' to classify data based on how it is measured or categorized.
Each scale builds upon the previous one and holds its predecessor’s properties along with its own.
Nominal Scale
The nominal scale is the most basic level of measurement. It classifies data into distinct categories that have no order or ranking.
Mathematical operations are not meaningful because this kind of data is purely categorical.
Example: Our previous example of gender (male, female, non-binary) falls into this category.
Ordinal Scale
The ordinal scale classifies data into categories that can be ordered or ranked, but the intervals between ranks are not uniform or measurable.
Mathematical operations like addition or subtraction are still not meaningful due to the categorical nature of this data. (Adding or subtracting labels isn’t possible or meaningful!)
Interval Scale
The interval scale has all the features of the ordinal scale, but the intervals between values are equal and meaningful, that is, the data is ordered and evenly spaced.
However, it lacks a true zero point (meaning zero does not represent the absence of the quantity).
Mathematical operations like addition and subtraction are meaningful, but ratios are not.
Example: 40°C is not twice as hot as 20°C because zero on the Celsius scale does not mean “no heat.” It’s simply a reference point for freezing water. Without a true zero, ratios are meaningless.
Ratio Scale
The ratio scale includes all features of the interval scale plus a true zero point. This allows for the full range of mathematical operations, including meaningful ratios.
Example: Income (A zero bank balance truly means having no money, so there is a true zero point)
Conclusion
Understanding what data really is gives us a strong foundation for learning statistics, data science, and machine learning. Data comes in different forms, sometimes categories and sometimes numbers, and knowing how to tell them apart and measure them correctly helps us analyze information the right way.
When we understand the types of data and the scales they follow, we can choose the right tools and avoid mistakes that come from treating data the wrong way.
As you move forward with data, keeping these basics in mind will help you make smarter, clearer decisions based on what the data is really telling you.