Replication Data for: Follicle Identification in Primate Ovaries via Machine Learningdoi:10.48349/ASU/BOK3VOASU Library Research Data Repository2025-09-191Sluka, James P.; Zelinski, Mary B.; Watanabe, Karen H.; Dietrich, Suzanne W.; Riley Israels, 2025, "Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning", https://doi.org/10.48349/ASU/BOK3VO, ASU Library Research Data Repository, V1Replication Data for: Follicle Identification in Primate Ovaries via Machine Learningdoi:10.48349/ASU/BOK3VOSluka, James P.Zelinski, Mary B.Watanabe, Karen H.Dietrich, Suzanne W.Riley IsraelsRao, Parth RavindraNagda, ParamDaniele, AlessiaJurado Gutierrez, AleliEgusquiza Diaz, ElianyVillanueva, EdmundoHernandez, GabriellaShah, GaurikaAzooz, MasaraDing, YianNSF DBI--2054061P51 OD011092ASU Library Research Data RepositorySluka, James P.Karen WatanabeMedicine, Health and Life SciencesOvarian FollicleArtificial intelligenceMachine LearningImage Processing, Computer-AssistedTransfer Machine LearningImage Processing, Computer-AssistedOvarian FollicleOvarian Follicle DevelopmentMacaca mulattaMacaca fuscataMacaca fascicularisFemale Reproductive SystemOverview: The number and types of follicles present in the ovary are key indicators of the reproductive health and capacity in females. This data set contains annotated H&E histology images from Rhesus (n=14), Cynomolgus (n=3) and Japanese (n=1) macaque (monkey) ovaries. The follicle images span the 6 preantral stages of primate ovarian follicle development: primordial, transitional primordial, primary, transitional primary, secondary, and multilayer. Follicle types were assigned by human experts. This data set is suitable for training machine learning algorithms to automatically identify and count follicles across these six developmental stages in ovarian histology images from non-human primates. In total, the dataset contains approximately 7,700 annotated follicles. These data were generated as part of the MOTHER-DB.org project. The data are partitioned across multiple zip archives, which are described in detail below. Within these zip files, the individual sub-images, which were extracted from full size histology images, are centered on a classified follicle and are 200 by 200 pixels (138 by 138 micrometer) in size. The source histology image, follicle type, and any manipulations on the sub-image, are encoded in the folder and sub-image file names. See 'README_FileNamingConventions' for details on interpreting the folder and filenames. The zip files include folders for each of the follicle classes. Note that the individual sub-image filenames also contain the follicle class. Therefore, if desired, you can combine all of the sub-images into a single folder without losing the follicle type assignments. The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: Within this Zip archive, individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N” extensions, where N is a digit from 1 to 6. Each zip file is about 20GB. A stable high-speed network is needed. It will likely take several hours to download all six zip files. Zip software will reconstruct the complete zip archive if you open the first file in the series. We have tested unpacking these multipart zip files using The Unarchiver for Mac (https://the-unarchiver.macupdate.com/) and 7-Zip for Windows and Linux (https://www.7-zip.org/download.html). Smaller data set that omits the Negatives, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zip”: This data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above. Smallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”: This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. Complete set of original histology images and annotations files, “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip”: The “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip” file (3.8GB) contains paired full size histology images, and follicle annotation files. The follicle annotation files give the location and follicle type of every identified follicle in the image. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The image files range in size from 130MB to 620MB each. The annotations files were output from QuPath, have a “.txt” extension, and are tab delimited text files. Each histology image file has an associated annotations file. For more information see the README_Training_Data_20250409.pdf file included in the zip file. README File Naming Conventions: The "README_FileNamingConventions.pdf" contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details.2021-03-012025-02-28H & E Histology imagesOvary histology images that produce the individual follicle sub images can be found at <a href="https://mother-db.org/search/">https://mother-db.org/search/</a>Original ovary histology images are available at <a href="https://mother-db.org">https://mother-db.org</a> Python code to generate the individual follicle subimages from annotated ovary histology sections is available in the MOTHER GitHub repository. See <a href="https://github.com/mother-db/MOTHER-DB-annotation-tools">https://github.com/mother-db/MOTHER-DB-annotation-tools</a><a href="http://creativecommons.org/licenses/by-nc/4.0">CC BY-NC 4.0</a>Sluka, J., Watanabe, K. & Ding, Y. (2023). MOTHER Ovarian Follicle Annotation using QuPath. Available at <a href="https://hdl.handle.net/2022/29016">https://hdl.handle.net/2022/29016</a>. Sluka, J., Watanabe, K., Ding, Y., Zelinski, M. & Dietrich, S. (2023). MOTHER Step-by-step supplementary files. Available at <a href="https://hdl.handle.net/2022/29015">https://hdl.handle.net/2022/29015</a>.Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Cynomolgus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/BUAU3E">https://doi.org/10.48349/ASU/BUAU3E</a>, ASU Library Research Data Repository. Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Rhesus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/46JWEX">https://doi.org/10.48349/ASU/46JWEX</a>, ASU Library Research Data Repository. Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Japanese Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/KM2QZQ">https://doi.org/10.48349/ASU/KM2QZQ</a>, ASU Library Research Data Repository.Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) Web portal contains project-wide information. See https://mother-db.orgMOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zipThis data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above.application/zipMOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zipSmallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”. This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. The README file contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details. application/zipMOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zipThis zip file contains paired images and follicle annotation files exported from QuPath. For example, the image file “14736_UN_050a.ome.tif” should be used with the annotations file “14736_UN_050a.annotations.txt”. Format of the histology slide files. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The files range in size from 130MB to 620MB each. Format of the QuPath Files of Annotations. The annotations files were output from QuPath and have a “.txt” extension. These are tab delimited text files.application/zipREADME_FileNamingConventions.pdfDescription of the subimage, augmentations, and file naming conventionsapplication/pdfMOTHER_Macaque_Monkey_Preantral_Follicles.zip.001The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles_00N.zip”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles_001.zip”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-streamMOTHER_Macaque_Monkey_Preantral_Follicles.zip.002The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-streamMOTHER_Macaque_Monkey_Preantral_Follicles.zip.003The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-streamMOTHER_Macaque_Monkey_Preantral_Follicles.zip.004The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-streamMOTHER_Macaque_Monkey_Preantral_Follicles.zip.005The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-streamMOTHER_Macaque_Monkey_Preantral_Follicles.zip.006The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series.application/octet-stream