Public Use File (PUF) Requests
The California Office of Statewide Health Planning and Development (OSHPD) currently provides public data sets for the years 2010-2014 of patient discharge (PDD), emergency department (ED), and ambulatory surgery (AS) data collected from licensed California hospitals, hospital emergency departments, and licensed freestanding ambulatory surgery clinics in California. Each record within the data sets consists of either one inpatient discharge, or one outpatient encounter, also known as a service visit. Data included in the public data sets includes clinical, payer, and facility information. You may be eligible to request some types of non-public patient data from OSHPD; please see Requesting OSHPD Patient Data Files for information on your eligibility. University-sponsored researchers, California-licensed hospitals and California local health officers and local health departments may request non-public data.
Data Availability and Ordering Instructions
- Review file documentation and read the What’s New notes. Does the data set meet your analytical needs?
- Submit a complete Public Use File Request Form.
Submit the signed form to the Healthcare Information Resource Center (HIRC) via fax (916) 324-9242 or HIRC E-mail.
- Complimentary Data - If your organization is a federal, state, city, or county government agency, nonprofit organization with 501(c)(3) status, nonprofit educational institution or public library, you may qualify for the most recent three years of data complimentary.
Requests for complimentary data must:
- Be submitted on official letterhead
- Be signed by the director of the organization or chair of the department
- Specifically identify the data product needed
- State the anticipated use of the data
- Include a copy of the official IRS document indicating the organization has a nonprofit 501(c)(3) status if applicable
- You will be contacted about payment and shipping options.
Revisions to the Public Use File to Protect Patient Confidentiality
Beginning in 2013, with the release of 2012 data, the Public Use File has been modified to protect patient confidentiality and minimize the risk of disclosure of confidential patient data while preserving most of the file’s clinical information.
This has resulted in significant changes to the patient record-level data files. For the PDD, ED and AS, the 5-digit Patient ZIP Code was replaced with a 3-digit Patient ZIP Code. In addition, for the PDD Total Charges was rounded to the nearest $1000; note that Total Charges is not collected for the ED and AS. Lastly, demographic and date variables that were included in the public use files have been removed from the new versions:
- Admission Quarter and Service Quarter
- Admission Year or Service Year
- Age Range (20 categories)
- Age Range (5 categories)
- Patient County
- Age in Years (at Admission or Date of Service)
- Do Not Resuscitate (DNR) - on PDD only
- Expected Source of Payment – Plan Code Number
A masking rule was applied to remove disclosure risk from unique Principal Diagnoses, Principal Procedures and Principal E-Codes. First, these codes were examined for any single occurrence for each hospital. If, for a given hospital, a single instance of either a Principal Diagnosis or Principal Procedure were found, the Principal codes were preserved, but all E-Codes and secondary diagnoses and related variables, and secondary procedures and related variables in the record were masked (*). If, for the given hospital, a single instance of a Principal E-Code were found, all E-Codes and E-Code Present on Admission (POA) variables on the record were masked (*).
For each record that is the only record for a facility for a report year, all data elements other than Hospital ID, Hospital Name, and Discharge Year were masked (*).
Data Set Cost
- 1. How much does the Public Use File cost?
$200 per year for each type of data (PDD, ED, AS). Nonprofit entities may be eligible to get the current three years for free.
- 2. What are the qualifications for complimentary data?
For organizations that are nonprofits (per Section 501(c)(3) of the Internal Revenue Code), the three most recent years of data may be provided at no charge. This includes California-licensed hospitals. Similarly, California state and local governments may request the data at no charge
Data Set Shipping
- 3. How long does data usually take to arrive?
Depending on location:
- GSO in California (overnight)
- FedEx Ground if outside of California (3-5 business days)
- 4. Can OSHPD send data files via Secured File Transfer Protocol?
Yes. See shipping options.
Data Set Availability
- 5. What is the earliest year you have for public data sets?
We have PDD, ED and AS starting in 2010 and will add earlier years as demand and resources permit. Currently, only 2010 2012 have been revised to the new standard for data disclosure.
Data Set Content
- 6. What is the difference between PDD, AS, and ED Public Use Files (PUF)?
The PDD, ED and AS data files represent data submissions from different types of California provider organizations. Patient discharge data is submitted to OSHPD by hospitals, emergency department data is submitted by hospital emergency departments, and ambulatory surgery (general acute care, acute psychiatric, chemical dependency recovery, and psychiatric health facilities) data is submitted by general acute care hospitals and licensed freestanding ambulatory surgery clinics.
- 7. Does the PUF contain demographic variables (age, gender, race, ethnicity, etc.)?
No, however, the federal Agency for Healthcare Research and Quality (AHRQ), as part of its Healthcare Cost and Utilization Project (HCUP), makes available de-identified files from the OSHPD patient-level data sets that have been statistically manipulated to render them un-linkable to other OSHPD patient-level datasets. Geographical identifiers (ZIP Code and county) have been removed from these files, but not demographic identifiers. Access to these files requires signing a detailed data use agreement and taking a short online training course on data use. More information and application kits are available at the HCUP Central Distributor Technical Assistance Center.
- 8. Does the PUF contain financial data?
No, not to any significant degree. The PDD contains a Total Charges variable, but the ED and AS contain no financial data. However, detailed facility-level financial and utilization data are freely available for hospitals and other healthcare facilities on the OHSPD Web site. Specifically, the Hospital Annual Financial Disclosure Reports and the Long Term Care Annual Financial Disclosure Reports provide very detailed account-level information for these facilities. Quarterly Financial and Utilization reports provide additional information. Please see the OSHPD Hospital Financial Data and Hospital Utilization webpages.
- 9. Are 3-digit ZIP Codes available for all records? Is there a masking rule based on population?
Three-digit ZIP Codes are available on all records; this variable is only masked if there is one record per facility (but there is only one or so of those records per file).
- 10. Is it possible to get County added to the Public Use File, given that Gender, Race, and Age are removed?
The new version of the Public Use File will not be modified; however, all feedback will be considered in the future. There currently are several county-level products available and more coming soon.
- 11. Is the revised Public Use File 3-digit ZIP Code the first three digits or the last three digits of the 5-digit ZIP Code?
The 3-digit ZIP Code is the USPS prefix, or the first three digits of the 5-digit ZIP Code.
Sharing the Public Use File
- 12. Can I show the PUF to my co-workers/affiliates or is this strictly for my own use?
The PUF Data Use Agreement specifies:
In accessing patient level data, I agree to the following:
- I will not further distribute any patient-level data or individual patient records, and I will not permit others to do so.
- I will not use or permit others to use the data to learn the identity of any individual patient.
- I will not link or permit others to link the data with any other individual level data that would increase the potential for patient identification.
Confidential Data Set Eligibility
- 13-a. Would you be able to provide the rules on non-California hospitals acquiring the private data set? (e.g. hospitals in Oregon, Nevada, etc.)
13-b. Will data users other than hospitals only have access to any type of file other than the public data set? (e.g. the general public, architectural/ consulting/ investment firms, etc.)The only record-level product non-hospital entities (such as the general public, architectural/consulting/ investment firms, etc) are eligible to receive is the Public Use File.
Nonprofit university sponsored researchers can also apply for confidential ("IPA") Files.
Non-California hospitals are not eligible for the AB 2876/confidential files.
- 14. What are the differences between OSHPD's public use files and confidential data?
“Model Data Set”
Public Use File Confidential data (protected health information [PHI], as defined by HIPAA) included in file
Date of Discharge / Visit
Date of Birth
Month of Admit / Discharge
Age in days (if less than 365)
None Eligibility Researchers sponsored by a Nonprofit University- California licensed hospital or local health department or health officer Anyone can request Internal Review Board (IRB) Approval Process Committee for the Protection of Human Subjects (CPHS) CPHS (if data used for research) No IRB approval required Data Use Agreement Required for everyone handling the record-level data Required for everyone handling the record-level data Included in completed application Unit of Analysis Discharge / Visit Discharge / Visit Discharge / Visit
- 15. What if I need data, but I do not qualify for the Non-Public version, either IPA or AB 2876?
Custom Run Requests are available for aggregate-level data for specified variables (data through 2012). There may be a product on the OSHPD website that you can use, or you can purchase California discharge data from the federal HCUP program.
- 16. How do I apply for Non-Public Data?
If you are a university-sponsored researcher you may qualify for an "IPA" request, i.e. a request for research data authorized under the Information Practices Act. These requests require among other things sponsorship of the researcher by a nonprofit educational institution. E-mail HIRC for more information.
If you are from a California Hospital or Public Health Department you may qualify for an "AB 2876" request; E-mail HIRC for more information.
- 17. I'm a graduate student who needs demographic variables that aren't in the Public Use File. How can I access record-level?
Please contact HIRC to talk to an analyst about your research questions and data options.
Data Set Format
- 18. What is the difference between comma-delimited text format (.txt) and SAS (.sas7bdat) format for the public data sets?
Comma-delimited text format provides the data as ASCII text. SAS formatted files are created using Statistical Analysis Software (SAS), a widely-used statistical data analysis software package, in a format native to the SAS program.
- 19. Why can't I see the data? Do I need a certain version of Excel or SAS software to see the data?
Statistical analysis software (SAS, SPSS, etc.) is required to open .sas7bdat formatted files. Comma-delimited files (.txt) can be opened by multiple software programs, including Excel, Access, SAS, and SPSS.
This page was last updated on Friday, September 30, 2016.
Type of Request
Data Years Available
Healthcare Information Resource Center
400 R Street, Suite 250
Sacramento, CA 95811-6213
Tel: (916) 326-3802
Fax: (916) 324-9242
Hours: Monday-Friday 8 a.m. to 5 p.m. (PST)