Site Loader
Purpose is to gain information
ex. blood or eye test
Test Consumer
Person or group that makes informed decisions on test results.
ex UCO in ACT results
Collection of related tests to measure one attribute
Form of Test
Rating Scales
Shows degree of attribute being graded
Form of Test
Rating scale to aid observation
Form of Test
Obtains information on an affective attribute
-Quantitative: Amount (#’s)
-Qualitative: descriptive terms
Uses both quantitative & qualitative
Interpreting the result
3 types of evaluation
-criterion reference
Norm Reference
Compared to others
ex. ACT scores at college
Criterion Reference
Specific set criteria. You have to get this score to be admitted.
ex. 90%=A in class
Self Reference
Compared against yourself
Pre test then post test
Judging end product
Steps leading up to product
Laboratory Test
(Pure research) Required highly specialized examiners & equipment
ex. blood test
Field Assessments
(applied research) Less expensive & less equipment required
ex. Paper test and 12 min run/walk
Lots of planning & research
What you have learned
Assessment taken during instruction to provide feedback
Assessment at end of instruction
Why Assess?
–Selection: Sports tryouts
-Classification: Group individuals to enhance learning
-Learning: feedback (tests are learning tools
-Diagnostic assessment: identify weakness
-Prognostic assessment: predict the potential for development
-Proficiency assessments- determine placement or exemption
Edward Hitchcock
Measurements to identify the ideal man
Dudley Sargent
(Harvard) Defined anthropology measures & used them to prescribe programs of exercise
Frederick Rand Rogers
Designed the first scientifically based strength test
Brigham & Sargent
Experimented with a dynamometer
Test based on changes in heart rate or blood pressure as subjects stood from a supine position
WW Tuttle
-Pulse-ratio test
-Involved block stepping
Harvard step test
-Treadmill test
-First maximal effort GXT
Kenneth Cooper
12 minute run/walk
Athletic Ability Testing
-Acquired level of learning in skills common to athletic performance
-Athletic Badge Test
-Began to replace strength test because they were better indicators of performance
-Character assessment inventories for PE came about in the 1920’s
-In most cases athletes scored lower than nonathletes
Sports Test Skills
-Developed as early as 1913
-Continue to be popular means of assessing physical ability
-AAHPERD/ Graduate programs
Cognitive Knowledge Tests
Scientifically constructed & standardized tests are rare in Physical Education
Fitness Assessments
-WW11 created a physical fitness testing boom
-After WW11 President Eisenhower advocate improved physical fitness for school children
-Results of Kraus-Weber test
-1988 AAHPERD switched to criterion-referenced tests
-1993 AAHERD endorsed the Cooper Institutes Fitnessgram
Authentic Assessment
-Consistent with curriculum emphasis logical extensions of work done in class
-Provide information for teachers to make further instructional choices
-Provide feedback
Academic specialization in testing & measurement
4 qualities of psychometric
-Validity: truthfulness
-Reliability: consistency
-Freedom from Assessment bias
Valid tests
Accurately assesses what it claims to assess
3 categories to consider when measuring validity
-Content validity
-Criterion validity
-Construct validity
Criterion & Content
Content Validity
-The degree to which the content of the test represents an identified domain. Determined by a panel of experts
-Should represent objectives of the unit
What is the one-way relationship between the validity and reliability of a testing instrument?
Results of a test may be reliable for not valid. An instrument might give consistent results but not measure what it claims to measure. If a test cannot provide stable and repeatable results, it is not possible for it to be valid.
Criterion Validity
-Scores are related to one or more outcomes criteria
-2 types of criterion validity (predictive & concurrent)
Concurrent Validity
-Degree to which results are comparable to other acceptable standards
-A reliability coefficient of .70 is usually considered acceptable
-Scores are related to one or more criterion
Construct Validity
-Degree with which results measure an attribute that cannot be directly measured
-ex. People who are considered good at sports would be expected to score higher on a skills test measuring skills of that sport then a person who isn’t
What is the single most important psychometric quality of an assessment instrument
The consistency which as assessment instrument measures what is intended to be measured
Reliability Coefficient
-A scale from .00 to 1.00 with 1.00 being the high point
-Measures of reliability of norm-referenced test include test-retest, alternate form & internal consistancy
Test-Retest Reliability
-Compares test averages with retest averages
-Preferred measure to determine reliability of the test
Alternate Form or Parallel forms Reliability
-Two test believed to measure the same trait or skill are administered to two groups in a difference order
-Scores are considered to determine consistency
Internal Consistency Reliability
-Consistency of examinee performance across parts of a test or from trial to trail within a test
-A single administration of a test
-Consistency of the scoring instrument
-No grey area (right or wrong) ex. multiple choice questions
-Subjectivity has a grey area ex. essay questions
Freedom from Assessment Bias

Post Author: admin


I'm Tamara!

Would you like to get a custom essay? How about receiving a customized one?

Check it out