salkind_midterm_prep

I thought I'd put the answers to the study guide here -- use them at your own risk (and feel free to correct me!)


 * **Chapter** ||  || **Page** ||
 * 1 || Table 1.1

5 main purposes: > Assigns Names > Measures differences in quality (not quantity) > Variables are categorical or discrete in nature > Outcomes can only be put into ONE category > The only distinction that can be made is that variables differ in the category in which they are placed > Assign rank > Describes how variables can be ordered on some sort of continuum > Outcomes are placed in categories, but they have an order or rank to them as well (e.g. stronger/weaker, faster/slower, etc) > Assign a value to an outcome that is based on some underlying continuum that has equal intervals. > If there is an underlying continuum, definite statements can be made about someone's score along that continuum relative to another person's score. > Big advantage of interval over ordinal or nominal is information. > Has the same characteristics of all the other measurements, but assumes an absolute zero corresponds to the absence of the trait or characteristic > It is rare to find a trait or characteristic in the behavioral or social sciences that an individual has a complete absence of || 27 ||
 * **Achievement tests** measures level of knowledge in a particular domain
 * **Personality tests** measures unique set of characteristics, traits, or attitudes
 * **Aptitude tests** measures potential to succeed
 * **Ability or Intelligence tests** measures skill or competence
 * **Performance tests** measures basic performance of particular tasks
 * **Vocational or career tests** measure job-related interests || 10 ||
 * || Why we test
 * 1) Selection
 * 2) Placement
 * 3) Diagnosis
 * 4) Hypothesis testing
 * 5) Classify || 11 ||
 * || Some important reminders
 * 1) Some behaviors can be observed more closely and more precisely than others
 * 2) Understanding of behaviors is only as good as the tools we use to measure it || 12 ||
 * 2 || The four horsemen (or levels) of measurement
 * 1) **Nominal level of measurement**
 * 1) **Ordinal level of measurement**
 * 1) **Internal level of measurement**
 * 1) **Ratio level of measurement**
 * || Table 2.1

(This is actually the chapter title - not sure if we need to know the exps they give in the book or the concepts...)
 * 1) **Are you measuring most of the available information?**
 * 2) **Ratio:** Most
 * 3) **Interval:** More
 * 4) **Ordinal:** Less
 * 5) **Nominal:** Least
 * 6) **Can you assign a name to the variable being measured?**
 * 7) **Ratio:** Yes
 * 8) **Interval:** Yes
 * 9) **Ordinal:** Yes
 * 10) **Nominal:** Yes
 * 11) **Can you assign an order to the variable being measured?**
 * 12) **Ratio:** Yes
 * 13) **Interval:** Yes
 * 14) **Ordinal:** Yes
 * 15) **Nominal:** No
 * 16) **Can you assign an underlying quantitative scale to the variable being measured?**
 * 17) **Ratio:** Yes
 * 18) **Interval:** Yes
 * 19) **Ordinal:** No
 * 20) **Nominal:** No
 * 21) **Can you assign an Absolute Zero to the variable being measured?**
 * 22) **Ratio:** Yes
 * 23) **Interval:** No
 * 24) **Ordinal:** No
 * 25) **Nominal:** No || 32 ||
 * 3 || Getting it right every time


 * Reliability**:
 * measures something consistently
 * the consistency of scores for the same set of people


 * Observed Score**:
 * Score one actually gets on a test
 * True Score**:
 * True 100% representation of what a person actually knows


 * Error Score**
 * Difference between the true and observed score
 * Observed score = true score + error score**
 * Where error consists of 2 types of errors:
 * Trait error: Sources of errors that reside w/i the test taker
 * Gina's editorial here: where do disabilities lie? Is it a trait error? If so, the "excuses excuses" comment is not appropriate
 * Method error: error that resides in the testing situation

Reliability = __True Score__ True Score + Error Score


 * As the Error Score gets smaller, the reliability gets larger
 * In a perfect world, if there is no error score reliability is perfect because it equals the test taker's true score
 * Goal of making tests is to reduce the sources of error as much as possible to increase the reliability of the test (the observed score will more closely equal the true score). || 37 ||
 * || Table 3.1


 * 1) **Test-retest Reliability:**
 * 2) **When Used:** To determine if a test is reliable over time
 * 3) **How Used:** Correlate scores from a test given in //Time 1// with the same test given in //Time 2//
 * 4) **What can be said when its done:** Test _ __is reliable over time.__
 * 5) __**Parallel Forms Reliability**__
 * 6) __**When Used:** To determine if several different forms of a test are__
 * 7) __Equivilant or__
 * 8) __Reliable__
 * 9) __**How Used:** Correlate scores from one test with scores from a second form of the same test of the same content__
 * 10) __**What can be said when its done:** The two forms of the__ Test are equivilant to one another & have shown parallel forms of reliability
 * 11) **Internal Consistency Reliability**
 * 12) **When Used:** To determine if a test assesses one and only one dimension
 * 13) **How Used:** Correlate each individual item score with the total score
 * 14) **What can be said when its done:** All the items on the _ test asses the same construct
 * 15) **Interrater Reliability**
 * 16) **When Used:** To determine if there is consistency in the rating of some outcome
 * 17) **How Used:** Examine the percentage of agreement between raters
 * 18) **What can be said when its done:** The interrater reliable for the __was__ _, indicating a high degree of agreement between the judges. || 43 ||
 * || How big is big? Interpreting reliability coefficients


 * Reliability Coefficients should be positive
 * Reliability Coefficients should be as large as possible (btwn +.00 and +1.00)
 * Generally speaking, a Reliability Coefficient of .70 is acceptable, although .80 is better
 * For Interrater Reliability should have nothing less than 90% || 57 ||
 * || Things to remember

Reasons for lack of information about reliability in a study:


 * Test is so well-known & popular that is is common knowledge
 * Original test designers never collected the data they needed to make a judgment about the reliability of the test || 58 ||
 * || Just one more thing (and it’s a big one)

First step to creating or using an instrument with sound psychometric properties is to establish its reliability. Because if the test created is unreliable, you can't say if one thing has affected another || 59 ||
 * 4 || The truth, the whole truth, and nothing but the truth


 * Validity:** the property of an assessment tool that indicates the tool does what it says it does. || 63 ||
 * || Tech talk

The extent to which inferences made from it are appropriate, meaningful, and useful || 64 ||
 * Technical Definition of Validity:**
 * || Reliability and validity: very close cousins

Validity wants to know what is being tested Reliability wants to know how consistently something is being tested || 65 ||
 * || Table 4.1


 * **Content Validity**
 * **When to use:** To determine if a sample of items reflects an entire universe of items in a certain topic
 * **How to use:** Examine content to be sure its an accurate sample of what is being tested
 * **What can be said when you are done:** "The quiz fairly assesses the chapter's content"
 * **Criterion Validity**
 * **When to use:** To determine if test scores are systematically related to other criteria, indicating that the test taker is competent in a certain area
 * **How to use:** Correlate scores from the test with some other measure that is already valid & assesses the same set of ability
 * **What can be said when you are done:** The test has been shown to be correlated with being a after 2 years of school. (Predictive validity)
 * **Construct Validity**
 * **When to use:** To determine if a test measures some underlying psychological construct
 * **How to use:** Correlate test scores with some theorized outcome that reflects the construct for which the test is being designed
 * **What can be said when you are done:** || 66 ||
 * || Tech talk: more reliability – validity going-on

The maximum level of validity is equal to the square root of the reliability coefficient.

The validity of a test is constrained by how reliable it is. || 77 ||
 * 5 || The basics: raw (scores) to the bone!


 * Raw Score:** The observed score.

Raw Scores:
 * Form the basis of other scores
 * Percentiles
 * Standard Scores


 * Norm-Referenced Scores:** Norms are used to evaluate one's relative performance


 * Criterion-Referenced Scores:** Compare scores to a certain absolute criterion


 * Norms:** Set of scores that represents a collection of individual performances. They are developed by administering a test to a large group of test takers. The complete group of scores is the measure to which individual scores are compared. || 83 ||
 * || Percentiles or percentile ranks


 * Percentile/Percentile Rank:** point in the distribution of scores below which a given percentage of scores fall. The terms are used interchangeably.

Exp: 45th percentile is the score below which 45% of the other score fall

Most often used score for reporting test results

The lower the percentile, the lower the person's rank in the group.


 * Percentage vs. Percentile:**
 * Percentile is a location along a continuum from 0 to 99.
 * Percentage is a form of a raw score, it reflects a proportion (% correct)


 * Formula for computing the percentile for any raw score in a set of scores:**

Pr = __B__ x 100 N

Pr = the percentile B = the number of observations with lower values N = total number of observations


 * Procedure to compute percentile given a raw score**
 * 1) Rank all scores with the lowest value
 * 2) Count number of occurrences where the score is worse
 * 3) Total number of scores = N
 * 4) Plug values into the formula

There will probably never be a value of 100 for a score on a test because you can beat everyone on the test, but you can't beat yourself. || 84 ||
 * || Percentiles: the sequel


 * 50th Percentile**: Q2 or the median. The score at which 50% of the remaining scores fall below an the remainder above. AKA the median.


 * Median**: The point at which 50% of the cases in a distribution fall below and 50% fall above.


 * 25th Percentile** = Q1 or first quartile


 * 75th Percentile** = Q3 or the third quartile


 * Decile**: The first 10 percentile ranks || 88 ||
 * || Stanines (or stanine scores)


 * Stanine:** One of nine equal segments in an normal distribution

each of the nine stanines represent one half of a standard deviation except Stanine 1 and Stanine 9, which are a big smaller. || 91 ||
 * || The standard (fare) scores


 * Standard Score:** A type of score that uses a common metric

Standard scores:


 * are directly comparable to one another
 * give a clear picture of one's relative distribution
 * is possible to compare scores across different distributions


 * Z Score:** The number of standard deviations between a raw score and the mean

Results from dividing the amount that a raw scroe differs from the mean of a set of scores by the standard deviation.

(I couldn't write the second X properly)
 * Equation:**

z = __X = X__ s

z = the z score X = the individual score X = the mean of the set of the test scores s = is the distribution standard deviation

A raw score above the mean will have a corresponding positive z score A raw score below the mean will have a corresponding negative z score

Disadvantage to Z-Scores: less-than-outstanding value placed on a any negative score.

A score that belongs to a normal distribution or Z scores that have been forced into a distribution that has the characteristics of a normal curve
 * Normalized Standard Scores**

A standard score that has a mean of 50 and a standard deviation of 10
 * T Scores**

T = 50 + 10z

T = T Score z = z Score

Eliminates negative numbers and fractional scores || 95 ||
 * || Tech talk

Z and T scores are similar:
 * both transformed scores
 * comparable across different distributions

Z and T scores are different:
 * Z scores use the standard deviation as the metric
 * T scores use the Z score as the metric
 * A set of Z scores generated from a distribution of raw scores has a mean of 0 and a standard deviation of 1
 * A set of T scores generate from the same distribution has a mean of 50 and a standard deviation of 50 || 101 ||
 * || Standing on your own: criterion-referenced tests

One where there is a predefined level of performance used for evaluation. || 101 ||
 * Criterion-based or Criterion-referenced test:**
 * || The standard error of measurement

A simple measure of how much observed scores vary from a true score
 * Standard Error of Measurement (SEM):**

is a simple measurement of how much a test score varies for an individual from time to time.

This value gives an estimate of the accuracy of any one test score

It is the standard deviation of repeated test scores

SEM = s x the square root of (1 -r)

SEM = Standard Error of Measurement s = standard deviation for the set of test scores r = reliability of the coefficient of the test

It is a measure of how much variability can be expected around any one individual's score on repeated testing.


 * 1) If there is no standard error of measurement, then raw scores are equal to true scores the test is pretty good
 * 2) The smaller the SEM, the more reliable the test. A large SEM represents a less reliable test.
 * 3) Goal is to minimize the SEM and make the test a more accurate measure of what is to be assessed. || 102 ||
 * 6 || The good and the bad


 * Short Answer and Completion Items**


 * **Good**
 * Flexible
 * Minimize Guessing
 * Lend themselves to computational Items
 * Easy to write
 * Allow for lots of items to be used
 * **Bad**
 * Machine scoring is difficult
 * Scoring can be subjective
 * Limited scope of cognitive skills are assessed
 * One-answer questions are tough to create || 113 ||
 * || Why completion and short answer items are good
 * 1) **They are flexible**
 * 2) used to asses any content area
 * 3) **Guessing is minimized**
 * 4) **Short answers are good for computational items**
 * 5) **allow for increased item sampling**
 * 6) They are easy to write, and test time is usually limited, so a good choice when you want a broader sample of items actually being assessed. || 114 ||
 * || Why completion and short answer items are not so good
 * 1) **Can't score by machine**
 * 2) **Scoring can be subjective**
 * 3) **Does not test advanced thinking skills**
 * 4) These types of test items are best for learning objectives are basic and focus on memorization & understanding of simple ideas
 * 5) **Its tough to write questions where there is only one correct answer.** || 114 ||
 * 7 || The good and the bad

Essay Questions


 * **Good**
 * Help find out how ideas are related to one another
 * Increase security
 * Provide increased flexibility in item design
 * Relatively easy to construct
 * **Bad**
 * Emphasize Writing
 * Difficult to write
 * Provide an inadequate sampling of the subject matter
 * Hard to score
 * emphasizes writing skills over content || 124 ||
 * || Why essay items are good


 * 1) **Essay questions prove is learners understand ideas and can relate them to one another.**
 * 2) **Increases security since it is harder to plagiarize**
 * 3) **Very flexible**
 * 4) **Easy to construct -** 4 essay questions are faster to create than 100 multiple choice items || 124 ||
 * || Why essay items are not so good


 * 1) **Emphasize writing**
 * 2) **Tough to write -** hard to create questions that test learner's knowledge of the learning objetives
 * 3) **Precision in sampling -** tough to adquetaly sample the entire universe of what the learner actually learned.
 * 4) **Writing may become more important than content -** students may be able to bluff their way through based on good writing ability || 125 ||
 * || How to score easy items


 * 1) **Allow for plenty of time to score each item**
 * 2) Read item once for general overview
 * 3) Read again for detailed analysis (assess content, assess writing skills)
 * 4) **Take Time**
 * 5) Grade in batches, take breaks, etc
 * 6) **Use a correct model to compare papers you are grading**
 * 7) **Score each question across all test takers**
 * 8) Do question 1 for everyone, then question 2, etc
 * 9) **If possible, grade without knowing the test taker's identity** || 126 ||
 * || Easy items and unreliability

Have to try and control what you can when grading
 * Try to ensure annonymity
 * use a model for scoring
 * Standardize conditions whenever possible || 128 ||
 * 8 || The good and the bad


 * Multiple-Choice Items**
 * **Good**
 * Can be used to measure learning outcomes at almost any level
 * Easy to understand (if well written)
 * Deemphasize writing skills
 * Minimize guessing
 * Easy to score
 * Easily analyzed for effectiveness
 * **Bad**
 * Take a long time to write
 * Good ones are very hard to write
 * They limit creativity
 * May have more than one correct answer || 140 ||
 * || Why multiple-choice are good


 * Can be used to measure learning outcomes at almost any level
 * Clear & straight-forward (if they are well written)
 * Eliminates differences between test takers based on their writing skills
 * Allows more time for more questions
 * Minimizes guessing (again if well written)
 * Easier to score, scoring is reliable
 * Multiple-choice items lend themselves to item analysis || 140 ||
 * || Why multiple-choice are not so good


 * They take a long time to write
 * Good ones are not easy to write
 * They do not allow for creative or unique
 * The best test takers know more than you (this is about only having one correct answer) || 141 ||
 * || Multiple-choice items: more than just “which one is correct”

> May be more than one correct answer, but only one is the best of the correct answers > test takers arrange a set of items in sequential order > the test takers reads through a passage and then selects a response where the alternatives are based on the same passage (but it places a premium on reading) > similar to short answer or completion items, but there are alternatives from which to select || 143 ||
 * **Best-answer multiple-choice items**
 * **Rearrangement multiple-choice items**
 * **Interpretive multiple-choice items**
 * **Substitution multiple-choice items**
 * || Computing the difficulty index

D = __Nh + Nl__ T

D = Difficulty level Nh = number of correct responses in the high group Nl = number of correct responses in the low group T = total responses to the item || 148 ||
 * || Computing the discrimination index

d = __Nh - Nl__ (.5)T

d = Discrimination level Nh = number of correct responses in the high group Nl = number of correct responses in the low group T = total responses to the item || 149 ||
 * 9 || The good and the bad

Matching Items


 * **Good**
 * Straightforward and clear
 * Easy to administer
 * Allow for comparison of ideas and facts
 * Responses are short and easy to read
 * Value of guessing is decreased
 * **Bad**
 * Level of knowledge tested is limited
 * test can be difficult to machine score
 * emphasizes memory || 162 ||
 * 10 || How to write ‘em: the guidelines (for true false questions)


 * Always stated as declarative sentences
 * Alternative choices can be true/false, right/wrong, yes/no, like/dislike but must be clear choices
 * Focuses on one and only one idea, concept, or specific topic
 * Be careful about using statements of opinion
 * Do not use double negatives
 * Be wary of using qualifiers (always, never, sometimes, unquestionable, none, best, etc)
 * Do not give clues to the answer in the item
 * Don't test more complex materials with true/false questions || 170 ||
 * || The good and the bad


 * **Good**
 * Convenient to Write
 * Easy to Score
 * **Bad**
 * True is not always true
 * Items emphasize memorization
 * Items are easy to guess || 173 ||
 * 11 || What’s a good portfolio?


 * **Both summative and formative in nature**
 * evaluation is continuous
 * efforts and accomplishments are evaluated as the portfolio is being crated
 * Summative - there is a final eval
 * **Reflects the multidimensional nature of both the task and the area**
 * helps student be expressive & think big
 * **They allow students to participate directly in their own growth and learning**
 * Student participates in the process of creating each element
 * Student receives feedback as [s]he is creating
 * Student becomes part of the direction the process will take
 * **Allow teachers to become involved in the process of designing and implementing the curriculum**
 * What is being taught becomes part of every day activity
 * How results in tight integration of activities and materials || 183 ||
 * 12 || How to do ‘em: the guidelines

Interviews


 * **Before you begin, explain nature of interview to the respondent**
 * How long it will take
 * What kinds of
 * **Practice**
 * **Ensure confidentiality**
 * **Choose your setting wisely**
 * **Allow more time than you think you will need**
 * **Take Notes**
 * **Stay in touch**
 * **Get & keep them talking**
 * **Put on a happy face**
 * **Use transitions to keep them going**
 * **Wrap up the interview** || 193 ||