Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning (doi:10.48349/ASU/BOK3VO)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description
Citation
Title:	Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning
Identification Number:	doi:10.48349/ASU/BOK3VO
Distributor:	ASU Library Research Data Repository
Date of Distribution:	2025-09-19
Version:	1
Bibliographic Citation:	Sluka, James P.; Zelinski, Mary B.; Watanabe, Karen H.; Dietrich, Suzanne W.; Riley Israels, 2025, "Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning", https://doi.org/10.48349/ASU/BOK3VO, ASU Library Research Data Repository, V1
Study Description
Citation
Title:	Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning
Identification Number:	doi:10.48349/ASU/BOK3VO
Authoring Entity:	Sluka, James P. (Indiana University Bloomington)
	Zelinski, Mary B. (Oregon Health & Science University)
	Watanabe, Karen H. (Arizona State University)
	Dietrich, Suzanne W. (Arizona State University)
	Riley Israels (Arizona State University)
Other identifications and acknowledgements:	Rao, Parth Ravindra
Other identifications and acknowledgements:	Nagda, Param
Other identifications and acknowledgements:	Daniele, Alessia
Other identifications and acknowledgements:	Jurado Gutierrez, Aleli
Other identifications and acknowledgements:	Egusquiza Diaz, Eliany
Other identifications and acknowledgements:	Villanueva, Edmundo
Other identifications and acknowledgements:	Hernandez, Gabriella
Other identifications and acknowledgements:	Shah, Gaurika
Other identifications and acknowledgements:	Azooz, Masara
Other identifications and acknowledgements:	Ding, Yian
Grant Number:	NSF DBI--2054061
Grant Number:	P51 OD011092
Distributor:	ASU Library Research Data Repository
Access Authority:	Sluka, James P.
Access Authority:	Karen Watanabe
Holdings Information:	https://doi.org/10.48349/ASU/BOK3VO
Study Scope
Keywords:	Medicine, Health and Life Sciences, Ovarian Follicle, Artificial intelligence, Machine Learning, Image Processing, Computer-Assisted, Transfer Machine Learning
Topic Classification:	Image Processing, Computer-Assisted, Ovarian Follicle, Ovarian Follicle Development, Macaca mulatta, Macaca fuscata, Macaca fascicularis, Female Reproductive System
Abstract:	<b>Overview:</b> <p> The number and types of follicles present in the ovary are key indicators of the reproductive health and capacity in females. This data set contains annotated H&E histology images from Rhesus (n=14), Cynomolgus (n=3) and Japanese (n=1) macaque (monkey) ovaries. The follicle images span the 6 preantral stages of primate ovarian follicle development: primordial, transitional primordial, primary, transitional primary, secondary, and multilayer. Follicle types were assigned by human experts. This data set is suitable for training machine learning algorithms to automatically identify and count follicles across these six developmental stages in ovarian histology images from non-human primates. In total, the dataset contains approximately 7,700 annotated follicles. These data were generated as part of the MOTHER-DB.org project. <p> The data are partitioned across multiple zip archives, which are described in detail below. Within these zip files, the individual sub-images, which were extracted from full size histology images, are centered on a classified follicle and are 200 by 200 pixels (138 by 138 micrometer) in size. The source histology image, follicle type, and any manipulations on the sub-image, are encoded in the folder and sub-image file names. See 'README_FileNamingConventions' for details on interpreting the folder and filenames. The zip files include folders for each of the follicle classes. Note that the individual sub-image filenames also contain the follicle class. Therefore, if desired, you can combine all of the sub-images into a single folder without losing the follicle type assignments. <p> <b>The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: </b> <p> Within this Zip archive, individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. <p> In total, the <b>dataset contains 1.7 million images</b> based on approximately 7,700 annotated follicles. <b>This is a large dataset at ~120GB.</b> You need to download the entire set of zip archives with the “.zip.00N” extensions, where N is a digit from 1 to 6. Each zip file is about 20GB. <b>A stable high-speed network is needed.</b> It will likely take several hours to download all six zip files. <p> Zip software will reconstruct the complete zip archive if you open the first file in the series. We have tested unpacking these multipart zip files using The Unarchiver for Mac (https://the-unarchiver.macupdate.com/) and 7-Zip for Windows and Linux (https://www.7-zip.org/download.html). <p> <b>Smaller data set that omits the Negatives, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zip”:</b> <p> This data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above. <p> <b>Smallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”: </b> <p> This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. <p> <b> Complete set of original histology images and annotations files, “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip”:</b> </p> <p>The “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip” file (3.8GB) contains paired full size histology images, and follicle annotation files. The follicle annotation files give the location and follicle type of every identified follicle in the image. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The image files range in size from 130MB to 620MB each. The annotations files were output from QuPath, have a “.txt” extension, and are tab delimited text files. Each histology image file has an associated annotations file. For more information see the README_Training_Data_20250409.pdf file included in the zip file. </p> <p> <b>README File Naming Conventions:</b> </p> <p> The "README_FileNamingConventions.pdf" contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details.
Date of Collection:	2021-03-01-2025-02-28
Kind of Data:	H & E Histology images
Methodology and Processing
Sources Statement
Data Sources:	Ovary histology images that produce the individual follicle sub images can be found at <a href="https://mother-db.org/search/">https://mother-db.org/search/</a>
Documentation and Access to Sources:	<p><b>Original ovary histology images</b> are available at <a href="https://mother-db.org">https://mother-db.org</a></p> <p><b>Python code</b> to generate the individual follicle subimages from annotated ovary histology sections is available in the MOTHER GitHub repository. See <a href="https://github.com/mother-db/MOTHER-DB-annotation-tools">https://github.com/mother-db/MOTHER-DB-annotation-tools</a></p>
Data Access
Notes:	<a href="http://creativecommons.org/licenses/by-nc/4.0">CC BY-NC 4.0</a>
Other Study Description Materials
Related Materials
	<p>Sluka, J., Watanabe, K. & Ding, Y. (2023). MOTHER Ovarian Follicle Annotation using QuPath. Available at <a href="https://hdl.handle.net/2022/29016">https://hdl.handle.net/2022/29016</a>.</p> <p>Sluka, J., Watanabe, K., Ding, Y., Zelinski, M. & Dietrich, S. (2023). MOTHER Step-by-step supplementary files. Available at <a href="https://hdl.handle.net/2022/29015">https://hdl.handle.net/2022/29015</a>.</p>
Related Studies
	<p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Cynomolgus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/BUAU3E">https://doi.org/10.48349/ASU/BUAU3E</a>, ASU Library Research Data Repository.</p> <p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Rhesus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/46JWEX">https://doi.org/10.48349/ASU/46JWEX</a>, ASU Library Research Data Repository.</p> <p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Japanese Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/KM2QZQ">https://doi.org/10.48349/ASU/KM2QZQ</a>, ASU Library Research Data Repository.</p>
Other Reference Note(s)
	Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) Web portal contains project-wide information. See https://mother-db.org
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zip
Text:	This data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above.
Notes:	application/zip
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip
Text:	Smallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”. This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. The README file contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details.
Notes:	application/zip
Other Study-Related Materials
Label:	MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip
Text:	This zip file contains paired images and follicle annotation files exported from QuPath. For example, the image file “14736_UN_050a.ome.tif” should be used with the annotations file “14736_UN_050a.annotations.txt”. Format of the histology slide files. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The files range in size from 130MB to 620MB each. Format of the QuPath Files of Annotations. The annotations files were output from QuPath and have a “.txt” extension. These are tab delimited text files.
Notes:	application/zip
Other Study-Related Materials
Label:	README_FileNamingConventions.pdf
Text:	Description of the subimage, augmentations, and file naming conventions
Notes:	application/pdf
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles_00N.zip”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles_001.zip”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.002
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.003
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.004
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.005
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream
Other Study-Related Materials
Label:	MOTHER_Macaque_Monkey_Preantral_Follicles.zip.006
Text:	The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.
Notes:	application/octet-stream