kodē

Let's Talk Data

The modern world depends heavily on data. Before we can identify useful patterns or relationships within data, we first need to understand What Data Truly Is. This fundamental understanding forms the foundation for studying data science, which focuses on extracting insights from data, and machine learning, which aims to uncover hidden patterns. Both fields are increasingly important for the future.

What Is Data?

The word Data originates from the Latin word Datum, meaning 'A Thing Given.' Therefore, data literally translates to 'Things Given.'

In statistics, data refers to facts or information collected through observation or experimentation. Essentially, data means things given that can be gathered in various ways.

This raises an important question: Is all data the same? For example, is the data about students’ heights in a class the same as their names? To answer this, we broadly classify data into two major types:

  1. Qualitative (Categorical) Data
  2. Quantitative (Numerical) Data

Qualitative (Categorical) Data

Qualitative data, as the name suggests, describes qualities. These qualities cannot be measured numerically but can be categorized based on characteristics or traits.

Example: Consider the values ‘male,’ ‘female,’ and ‘non-binary.’ These are not numbers, but they still provide valuable information. When collected in large quantities, this data can reveal important insights. For simplicity, we group these values under a single category called ‘Gender.’ Referring to this category means we are referring to these three labels. These values act as labels and can be very useful when analyzed using appropriate statistical methods.

However, qualitative data raises another question: Can there be a ranking within a category? Can one value be considered more important or higher than another?

To answer this, qualitative data is further divided into two types:

  1. Nominal Data
  2. Ordinal Data

Nominal Data

Nominal data consists of purely categorical values with no order or ranking.

The gender example (male, female, non-binary) falls into this category because there is no meaningful ranking among these labels.

Ordinal Data

Ordinal data also consists of categorical values, but these have an inherent order or ranking. However, the intervals between these values are not equal or precisely known.

Example: Education level (High School, Bachelor’s, Master’s, Ph.D.). We know these labels follow an order, but, you cannot pursue a master’s degree before finishing a bachelor’s, and you cannot earn a bachelor’s without completing high school. But we cannot say precisely how large the difference or gap is between these levels.


But enough with categorical data, real-world data is often a mix of both numbers and categories. For numerical values, we have another classification: Quantitative data.

Quantitative Data

Quantitative data is numerical data that can be measured or counted. It deals with numbers and values that can be measured objectively.

Numbers themselves can be broadly divided into two types: whole numbers (integers) and decimal numbers (numbers with fractional parts). Formally, these are classified as:

  1. Discrete Data
  2. Continuous Data

Discrete Data

Discrete data consists of countable, distinct values. These values are often integers.

Example: The number of cars in a parking lot (There cannot be 4.5 cars; there are either 4 or 5 cars.)

Continuous Data

Continuous data can take any value within a range.

Example: Between 0 and 1, there are infinitely many numbers. Continuous data can take any of these values (e.g., 0.1, 0.11, 0.111, and so on).

Types of Data


Now that we understand what data is and its types, it’s important to also grasp How Data Is Measured or Categorized. This is where Scales of Measurement come into play.

Scales of Measurement

Scales of Measurement describe the level at which data is measured and determine what kinds of mathematical operations can be meaningfully performed on that data.

The term Scales of Measurement reflects the idea that data can be organized or measured along different levels or 'scales.' Just as a ruler has various markings to measure length precisely, scales of measurement provide levels or 'scales' to classify data based on how it is measured or categorized.

Each scale builds upon the previous one and holds its predecessor’s properties along with its own.

Scales of Measurement

Nominal Scale

The nominal scale is the most basic level of measurement. It classifies data into distinct categories that have no order or ranking.

Mathematical operations are not meaningful because this kind of data is purely categorical.

Example: Our previous example of gender (male, female, non-binary) falls into this category.

Ordinal Scale

The ordinal scale classifies data into categories that can be ordered or ranked, but the intervals between ranks are not uniform or measurable.

Mathematical operations like addition or subtraction are still not meaningful due to the categorical nature of this data. (Adding or subtracting labels isn’t possible or meaningful!)

Interval Scale

The interval scale has all the features of the ordinal scale, but the intervals between values are equal and meaningful, that is, the data is ordered and evenly spaced.

However, it lacks a true zero point (meaning zero does not represent the absence of the quantity).

Mathematical operations like addition and subtraction are meaningful, but ratios are not.

Example: 40°C is not twice as hot as 20°C because zero on the Celsius scale does not mean “no heat.” It’s simply a reference point for freezing water. Without a true zero, ratios are meaningless.

Ratio Scale

The ratio scale includes all features of the interval scale plus a true zero point. This allows for the full range of mathematical operations, including meaningful ratios.

Example: Income (A zero bank balance truly means having no money, so there is a true zero point)


Conclusion

Understanding what data really is gives us a strong foundation for learning statistics, data science, and machine learning. Data comes in different forms, sometimes categories and sometimes numbers, and knowing how to tell them apart and measure them correctly helps us analyze information the right way.

When we understand the types of data and the scales they follow, we can choose the right tools and avoid mistakes that come from treating data the wrong way.

As you move forward with data, keeping these basics in mind will help you make smarter, clearer decisions based on what the data is really telling you.

#BeyondAverage #Statistics #Stats101