Log in / Register
Home arrow Computer Science arrow The Privacy Engineer’s Manifesto
< Prev   CONTENTS   Next >

Personal Information

It is critical for privacy engineers to thoroughly understand how personal information is defined and how its definition evolves and shifts over time. Personal information is the asset protected by privacy rules, processes, and technologies. Traditionally, personal information has been defined as information that directly identifies or, in combination with other data, allows for the identification of an individual (i.e., basic examples are an individual's name, address, phone number, or national or tax identification number) or any otherwise-anonymous information that when combined can only be a single person. An example of this would be “the CPO of Sun Microsystems in 2005,” because there is only one person who fits this description. An example of anonymous information would be “three of the thousand engineers carry laptops,” because the characterization fits more than one person and, therefore, does not identify anyone in particular.

Traditionally, the term for these data elements has been personally identifiable information (PII) or, alternatively it could be called personal information (PI). Using different nomenclature can create unnecessary confusion due to unnecessary distinctions. The real issue is does the data alone, or in combination with other data, identify a single individual? The term PII is useful, however, in terms of determining which elements make a collection of information personal or identifying which data elements need to be removed to depersonalize or deidentify it. We will use PI as our convention throughout the rest of the book.

Some forms of PI are additionally considered “sensitive,” either culturally, under the law, or both (e.g., the type of information that can be used to embarrass, harm, or discriminate against someone). Different cultures consider different categories of PI as sensitive PI, but the following are fairly common:

• Information about an individual's medical or health conditions

• Financial information

• Racial or ethnic origin

• Political opinions

• Religious or philosophical beliefs

• Trade union membership

• Sexual orientation

• Information related to offenses or criminal convictions

Largely due to the explosion of the Internet, mobile computing, and telecommunications technology, the definition of PI is evolving to include unique device and network identifiers such as the universally unique identifier (UUID) and Internet protocol (IP) addresses. The Federal Trade Commission effectively redefined PI to include certain types of what used to be considered machine data such as device ID and IP addresses when it stated in its 2010 report, “Protecting Consumer Privacy in an Era of Rapid Change,” that: the proposed framework is not limited to those who collect personally identifiable information (“PII”). Rather, it applies to those commercial entities that collect data that can be reasonably linked to a specific consumer, computer, or other device.


It should be noted that not all device IDs or IP addresses should be considered PI de facto. Some devices, just as some IP addresses, are not associated with an identifiable person or personal system.

one way to remove risk or potential harm in processing personal information is to only use what is needed. one strategy for this is to deidentify or anonymize the data before using it.

Anonymizing or deidentifying data begins when deciding what to collect or use. if personal information is not needed, then it is better not to collect or use it.

Always ask (1) is the information needed to serve the purpose of the processing; and

(2) what is the minimum amount of information that is needed?

Example: Birth date: is the day and month of birth needed or the actual birth data (day, month, year)? if the purpose is to automate birthday salutation, then month and date of birth should be sufficient. if the requirement is to ascertain age as part of authorizing access to content on a web site, just ask month and year, or age, or better yet, ask the age in ranges of 5 years.

Example: Geographic location: if the requirement is geographic, is GPs needed or will street address, ZiP code, or just city and state meet the need?

The second part of the discussion has to do with uses of the data. some of the uses of the data may require the elements that make it personal information; others may not. so it becomes important to think about how to anonymize or deidentify data.

does Pi – P = i? in other words, if one removes the personal, is what is left just information? well, technically yes; but this is something you may not want to be right merely on a technicality.

Consider the number of people in the data pool. For instance, although the information may be anonymous (because the personal identifiers have been removed), the data is still very distinct and the pool of possibilities so small that it might effectively reflect only three or four people. so, although the information does not truly identify a single person, the group is so small that an educated guess can easily be made as to whom is in it. You could say there are different levels of anonymization. one in 10 is different from one in 10,000.

Another vector to be considered is the methodology. How was the data anonymized? were the unique identifiers removed completely from the dataset or were they merely replaced with a pseudonym?

if it was replaced with a pseudonym, does the pseudonym pass a reidentification test? or can the data still be used to take action or contact a person? if it doesn't pass the reidentification test or it still can be used to contact a person or reasonably linked to a system, then it cannot be truly called “anonymized,” perhaps deidentified, but not anonymized.

A third vector to consider is whether specific data elements are needed or whether ranges or categories suffice. in other words, using an executive income report as an example, one can remove name and titles, but even in large organizations, the actual income may be unique enough that it identifies an individual even though all other descriptors have been removed or genericized.

Finally, if the decision is to aggregate data, make sure it is anonymized as well. Aggregate data about a single individual is not necessarily anonymized.

  • [1] Federal Trade Commission, “Protecting Consumer Privacy in an Era of Rapid Change:

    Recommendations For Businesses and Policymakers,” p. 43.

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science