Menu
Home
Log in / Register
 
Home arrow Business & Finance arrow Fraud and fraud detection
< Prev   CONTENTS   Next >

PREPARATION FOR DATA ANALYSIS

Most of the techniques and examples in the rest of this book will be demonstrated using the IDEA Version 9.1 software. The concepts can be applied to other data analytic software.

Arbutus Analyzer ODBC Data Import

FIGURE 3.1 Arbutus Analyzer ODBC Data Import

Source: Arbutus Software Inc.

Arbutus Analyzer Data Import by Arbutus Software Inc.

FIGURE 3.2 Arbutus Analyzer Data Import by Arbutus Software Inc.

Source: Arbutus Software Inc.

IDEA Data Import Assistant by CaseWare IDEA Inc.

FIGURE 3.3 IDEA Data Import Assistant by CaseWare IDEA Inc.

Once you have good data in your data analytic software, you are almost ready to start your analysis. Foremost, you have to be familiar with your data. You need to know what the data contains in order to decide what tests can be applied or formulated. It is important to know whether critical fields are populated and to what extent. At times, the data is not laid out well for you to perform desired analysis techniques and needs further work. You can rearrange and reorganize the data within the data analytical software.

Data Familiarization

Know your data! You cannot effectively use the data if you are not aware of what it contains. Knowledge of the data at summary and statistical overview levels will give a good sense of what is in the data. There are a number of features and options that can be employed to help you profile the data. Available features in IDEA are Control Total, Field Statistics, Summarization, Stratiication, and Pivot Table. Figure 3.4 shows payment tender data opened in the IDEA software.

This data file is from the point-of-sale system (POS) of a fast-food, take-out restaurant. The restaurant is located in a strip mall with a high school nearby. It is open:

Monday to Thursdays 11:00 a.m. to 10:00 p.m.

Friday 11:00 a.m. to 11:00 p.m.

Saturday 12:00 noon to 11:00 p.m.

Sunday 12:00 noon to 10:00 p.m.

Payment Tender Type File Example in IDEA

FIGURE 3.4 Payment Tender Type File Example in IDEA

Numeric, date, and time fields are displayed in field statistics in Figure 3.5. The only numeric field that is of use is the pay amount field, which we will look at closer in Figure 3.6.

Field Statistics of the Payment Tender File

FIGURE 3.5 Field Statistics of the Payment Tender File

The net value is the total of the field or column, which equals the control total. By putting your cursor over items it turns into a hand icon, and those items can be clicked on to drill down to detail levels and display the records pertaining to those items. There are seven records with zero value. Note that the average payment for the sales is $19.48. Both the minimum value and maximum value amount should be examined in detail. A discussion of the sample and population standard deviation values follows in Chapter 4.

The statistics for the date field shows the date of the earliest and latest record so you know the date range for the data you will be analyzing. The most frequent or common day of the week is Friday and the most common month is September. Therefore, the

Isolating the Field Statistics of the Payment Amount

FIGURE 3.6 Isolating the Field Statistics of the Payment Amount

most frequent payments and sales occur on Fridays and September was the best month for the client. The monthly and daily number of transactions are also displayed in field statistics, as shown in Figure 3.7.

The payment time also provides valuable information in Figure 3.8. The most payments are processed after the noon hour, with 80,609 or 95 percent of transactions taking place. This makes perfect sense, since the business opens either at 11:00 a.m. or at noon, depending on the day of the week. If you were to apply sampling techniques, you would want to pull more samples from after the noon hour.

The tender or payment type can be summarized to identify the number of transactions for each category and total the amounts as shown in Figure 3.9.

We can see in Figure 3.10, that debit cards are used most frequently, followed by cash payments. Visa, MasterCard, and American Express (AME) are accepted but are not as popular with their customers.

Now that you are aware that the average payment is $19.48, you can stratify the amounts to give you a range of how many transactions occurred in each range as displayed in Figure 3.11. The ranges can be at a fixed increment or, better yet, you can set the explicit ranges to produce results that are more meaningful to the context of the data. You can group the output if you wish. In this case, we selected the tender payment field to group so that cash, debit, MasterCard, Visa, and so on amounts will be displayed

Date Field Statistics

FIGURE 3.7 Date Field Statistics

in ranges. We determined that there were only these six tender or payment types when we summarized the data.

Stratifying the debit card payments shows that 9 7 percent of the transactions and 90 percent of the total dollar values are $50 dollars and below. There are no negative amounts for debit card transactions as seen in Figure 3.12. Similarly, there are no negatives for Visa, MasterCard, and AME (not shown).

The stratification of cash payments shows one negative amount in Figure 3.13 . It is also interesting to note that neither the percentage of number of records nor the percentage of the payment amounts follows the same pattern as the debit card payments.

Time Field Statistics

FIGURE 3.8 Time Field Statistics

Summarization of Payment Tender Types

FIGURE 3.9 Summarization of Payment Tender Types

Summarization of Tender Type Results

FIGURE 3.10 Summarization of Tender Type Results

Stratification of Payment Amounts by Tender Types

FIGURE 3.11 Stratification of Payment Amounts by Tender Types

Results of the Stratification for Debit Card Payments

FIGURE 3.12 Results of the Stratification for Debit Card Payments

The year field was previously added, so now we can perform additional data profiling using the year field.

Pivot Table allows data to be displayed, organized, and summarized in different views (Figure 3.14). It allows for a better overall picture to analyze data. In this example, we display by row the tender payment type and by column the year. The amounts are

Results of the Stratification for Cash Payments

FIGURE 3.13 Results of the Stratification for Cash Payments

Creating a Pivot Table View and Results

FIGURE 3.14 Creating a Pivot Table View and Results

cross-tabulated both by rows and columns. This provides an understanding of how payments fluctuate from year to year. For instance, cash payments dropped significantly in 2009 and 2010 before increasing again in 2011. This is in contrast to debit card payments that remained consistent. Recall from the field statistics that the 2007 year had approximately 6 months' worth of transactions, while 2011 had only the first 10 months of transactions.

 
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Philosophy
Political science
Psychology
Religion
Sociology
Travel