4. METHODOLOGY

This system is mainly concentrated on anonymization method with is used to provide
privacy to the dataset so that the attacker will not gain any sensitive information about the
individuals. Anonymization is the best method to provide privacy when compared to the other
methods like randomization, perturbation etc. Anonymization can be done in many ways, there
are several tools available to perform anonymization. Health care and financial data are very
sensitive. There are many methods to provide privacy to the dataset. The objective of this system
is to run the k-anonymity method.
A hospital dataset which contains the patient’s information with attributes of Patient id,
Patient Name, Age, Sex and disease as shown in table 1. In this table, Name attribute is the
personal identification, Disease is the sensitive attribute. If suppose we want to provide the
privacy of the data set, the patient consultancy field of a table is removed and it will be modified
to another table as follows.

Table 1: Patient dataset.

8

Zipcode Age Sex Disease
47677 29 M Ovarian Cancer
47678 22 M Ovarian Cancer
47602 27 M Prostate Cancer
47909 43 M Flu
47905 32 F Heart Disease
47906 47 M Heart Disease

Table 2: Patient dataset after removing Name attribute

So removing the personal identification information will not provide complete privacy to
the data. To provide privacy to the dataset first we have to remove the personal Identification
information and then we have to anonymize the quasi identifiers. The sensitive attributes should
always be released directly because researcher’s want this information. Different privacy
preserving methods have been proposed. To anonymize the quasi-identifiers, K-anonymity.

4.1 K-ANONYMITY

This approach is as follows: The information for each person contained in the released
dataset cannot be distinguished from at least k-1 individuals whose information also appears in
the data. For example: if an attacker with the only information of birthdates and gender is trying
to identify a person in the released dataset. There are k persons in the table with the same birth
date and gender. In k anonymity any quasi-identifier present in the released table must appear in
at least k records. The goal of K-anonymity is to make each record indistinguishable from at
least k-1 other records. These K records form an equivalence class.

9

Zipcode Age Sex Disease
476** 2* M OvarianCancer
476** 2* M OvarianCancer
476** 2* M ProstateCancer
479** 3* F HeartDisease
479** 4* M Flu
479** 4* M HeartDisease

K-anonymity uses generalization and suppression. Using generalization, k anonymity
replaces specific quasi-identifiers with less specific values until it gets K identical values. And it
uses suppression when generalization causes too much information loss, which is referred as
outliers. Form the table 1 we have 3 quasi-identifiers which can be generalized as shown in the
figure 1

Figure 1: Generalization on Quasi-identifiers like patient id, age and sex

By applying k=2 anonymity and quasi-identifier { patient id , Age, sex} on table 2 we
will get the new table 3. Now if we compare table 2 and table 3 it is difficult for an outsider to
find the sensitive information because there are three people with generalized patient id and age.
In table 3 first three records form one equivalence class and last two records are another
equivalence class.

Table 3: k-anonymity on table 2

10

Zipcode Age Sex Disease
476** 2* M OvarianCancer
476** 2* M OvarianCancer
476** 2* M ProstateCancer
* * * *
479** 4* M Flu
479** 4* M Heart Disease

Any records which has not come into any equivalence class should be suppressed. In this
table record 4 will not fall into any of the equivalence class so it should be suppressed. By
applying the generalization and suppression to all fields on table 3 it results to another Table 5.

Equivalence
Class

Suppressed
Record

Equivalence
Class

table 4 : Generalization and suppression

The problem with the k-anonymity is, it will not provide privacy if sensitive values in an
equivalence class lack diversity and also if the attacker has background knowledge. Consider
Table 4 the first 3 records which form an equivalence class have the same sensitive attribute
values where there is no privacy and attacker can direct to gain the information. And last three
records if attacker has some background knowledge about the person (ex. The person father is a
heart patient) then this information may be useful for the attacker to gain the sensitive
information.

4.2 Triple DES Algorithm

Triple DES is another mode of DES operation. It takes three 64-bit keys, for an overall
key length of 192 bits. In Stealth, you simply type in the entire 192-bit (24 character) key rather
than entering each of the three keys individually. The Triple DES DLL then breaks the user-
provided key into three sub keys, padding the keys if necessary so they are each 64 bits long.

11

The procedure for encryption is exactly the same as regular DES, but it is repeated three
times, hence the name Triple DES. The data is encrypted with the first key, decrypted with the
second key, and finally encrypted again with the third key.
Triple DES runs three times slower than DES, but is much more secure if used properly. The
procedure for decrypting something is the same as the procedure for encryption, except it is
executed in reverse. Like DES, data is encrypted and decrypted in 64-bit chunks.
Although the input key for DES is 64 bits long, the actual key used by DES is only 56
bits in length. The least significant (right-most) bit in each byte is a parity bit, and should be set
so that there are always an odd number of 1s in every byte. These parity bits are ignored, so only
the seven most significant bits of each byte are used, resulting in a key length of 56 bits. This
means that the effective key strength for Triple DES is actually 168 bits because each of the
three keys contains 8 parity bits that are not used during the encryption process.

The process of encryption is as follows –
1. Encrypt the data using DES Algorithm with the help of first key.
2. Now, decrypt the output generated from the first step using DES Algorithm with the help
of second key.
3. Finally, encrypt the output of second step using DES Algorithm with the help of third
key.

The decryption process of any cipher text that was encrypted using Triple DES Algorithm is the
reverse of the encryption process i.e.,
1. Decrypt the cipher text using DES Algorithm with the help of third key.
2. Now, encrypt the output generated from the first step using the DES Algorithm with the
help of second key.
3. Finally, decrypt the output of the second step using DES Algorithm with the help of first
key.

The process of encrypt – decrypt – encrypt help complexing things and securing the data.
The three keys can also be same or two of them can be same. But it is recommended to use all
the three keys different.

12

4.3 SYSTEM SPECIFICATION

Hardware Specification

Processor : Intel Pentium i3.
RAM : 4GB
Hard drive : 500 GB
Monitor : 17″ Flat L.G color SVGA

Keyboard : Multimedia keyboard

Mouse : Optical scroll mouse

Software Specification

Operating System : Windows XP and Above

Front-End : ASP.Net 2010

Database Server : Microsoft SQL Server

Application Server : IIS

Post Author: admin