|
View: |
Part 1: Document Description
|
|
Citation |
|
|---|---|
|
Title: |
Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning |
|
Identification Number: |
doi:10.48349/ASU/BOK3VO |
|
Distributor: |
ASU Library Research Data Repository |
|
Date of Distribution: |
2025-09-19 |
|
Version: |
1 |
|
Bibliographic Citation: |
Sluka, James P.; Zelinski, Mary B.; Watanabe, Karen H.; Dietrich, Suzanne W.; Riley Israels, 2025, "Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning", https://doi.org/10.48349/ASU/BOK3VO, ASU Library Research Data Repository, V1 |
|
Citation |
|
|
Title: |
Replication Data for: Follicle Identification in Primate Ovaries via Machine Learning |
|
Identification Number: |
doi:10.48349/ASU/BOK3VO |
|
Authoring Entity: |
Sluka, James P. (Indiana University Bloomington) |
|
Zelinski, Mary B. (Oregon Health & Science University) |
|
|
Watanabe, Karen H. (Arizona State University) |
|
|
Dietrich, Suzanne W. (Arizona State University) |
|
|
Riley Israels (Arizona State University) |
|
|
Other identifications and acknowledgements: |
Rao, Parth Ravindra |
|
Other identifications and acknowledgements: |
Nagda, Param |
|
Other identifications and acknowledgements: |
Daniele, Alessia |
|
Other identifications and acknowledgements: |
Jurado Gutierrez, Aleli |
|
Other identifications and acknowledgements: |
Egusquiza Diaz, Eliany |
|
Other identifications and acknowledgements: |
Villanueva, Edmundo |
|
Other identifications and acknowledgements: |
Hernandez, Gabriella |
|
Other identifications and acknowledgements: |
Shah, Gaurika |
|
Other identifications and acknowledgements: |
Azooz, Masara |
|
Other identifications and acknowledgements: |
Ding, Yian |
|
Grant Number: |
NSF DBI--2054061 |
|
Grant Number: |
P51 OD011092 |
|
Distributor: |
ASU Library Research Data Repository |
|
Access Authority: |
Sluka, James P. |
|
Access Authority: |
Karen Watanabe |
|
Holdings Information: |
https://doi.org/10.48349/ASU/BOK3VO |
|
Study Scope |
|
|
Keywords: |
Medicine, Health and Life Sciences, Ovarian Follicle, Artificial intelligence, Machine Learning, Image Processing, Computer-Assisted, Transfer Machine Learning |
|
Topic Classification: |
Image Processing, Computer-Assisted, Ovarian Follicle, Ovarian Follicle Development, Macaca mulatta, Macaca fuscata, Macaca fascicularis, Female Reproductive System |
|
Abstract: |
<b>Overview:</b> <p> The number and types of follicles present in the ovary are key indicators of the reproductive health and capacity in females. This data set contains annotated H&E histology images from Rhesus (n=14), Cynomolgus (n=3) and Japanese (n=1) macaque (monkey) ovaries. The follicle images span the 6 preantral stages of primate ovarian follicle development: primordial, transitional primordial, primary, transitional primary, secondary, and multilayer. Follicle types were assigned by human experts. This data set is suitable for training machine learning algorithms to automatically identify and count follicles across these six developmental stages in ovarian histology images from non-human primates. In total, the dataset contains approximately 7,700 annotated follicles. These data were generated as part of the MOTHER-DB.org project. <p> The data are partitioned across multiple zip archives, which are described in detail below. Within these zip files, the individual sub-images, which were extracted from full size histology images, are centered on a classified follicle and are 200 by 200 pixels (138 by 138 micrometer) in size. The source histology image, follicle type, and any manipulations on the sub-image, are encoded in the folder and sub-image file names. See 'README_FileNamingConventions' for details on interpreting the folder and filenames. The zip files include folders for each of the follicle classes. Note that the individual sub-image filenames also contain the follicle class. Therefore, if desired, you can combine all of the sub-images into a single folder without losing the follicle type assignments. <p> <b>The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: </b> <p> Within this Zip archive, individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. <p> In total, the <b>dataset contains 1.7 million images</b> based on approximately 7,700 annotated follicles. <b>This is a large dataset at ~120GB.</b> You need to download the entire set of zip archives with the “.zip.00N” extensions, where N is a digit from 1 to 6. Each zip file is about 20GB. <b>A stable high-speed network is needed.</b> It will likely take several hours to download all six zip files. <p> Zip software will reconstruct the complete zip archive if you open the first file in the series. We have tested unpacking these multipart zip files using The Unarchiver for Mac (https://the-unarchiver.macupdate.com/) and 7-Zip for Windows and Linux (https://www.7-zip.org/download.html). <p> <b>Smaller data set that omits the Negatives, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zip”:</b> <p> This data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above. <p> <b>Smallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”: </b> <p> This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. <p> <b> Complete set of original histology images and annotations files, “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip”:</b> </p> <p>The “MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip” file (3.8GB) contains paired full size histology images, and follicle annotation files. The follicle annotation files give the location and follicle type of every identified follicle in the image. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The image files range in size from 130MB to 620MB each. The annotations files were output from QuPath, have a “.txt” extension, and are tab delimited text files. Each histology image file has an associated annotations file. For more information see the README_Training_Data_20250409.pdf file included in the zip file. </p> <p> <b>README File Naming Conventions:</b> </p> <p> The "README_FileNamingConventions.pdf" contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details. |
|
Date of Collection: |
2021-03-01-2025-02-28 |
|
Kind of Data: |
H & E Histology images |
|
Methodology and Processing |
|
|
Sources Statement |
|
|
Data Sources: |
Ovary histology images that produce the individual follicle sub images can be found at <a href="https://mother-db.org/search/">https://mother-db.org/search/</a> |
|
Documentation and Access to Sources: |
<p><b>Original ovary histology images</b> are available at <a href="https://mother-db.org">https://mother-db.org</a></p> <p><b>Python code</b> to generate the individual follicle subimages from annotated ovary histology sections is available in the MOTHER GitHub repository. See <a href="https://github.com/mother-db/MOTHER-DB-annotation-tools">https://github.com/mother-db/MOTHER-DB-annotation-tools</a></p> |
|
Data Access |
|
|
Notes: |
<a href="http://creativecommons.org/licenses/by-nc/4.0">CC BY-NC 4.0</a> |
|
Other Study Description Materials |
|
|
Related Materials |
|
|
<p>Sluka, J., Watanabe, K. & Ding, Y. (2023). MOTHER Ovarian Follicle Annotation using QuPath. Available at <a href="https://hdl.handle.net/2022/29016">https://hdl.handle.net/2022/29016</a>.</p> <p>Sluka, J., Watanabe, K., Ding, Y., Zelinski, M. & Dietrich, S. (2023). MOTHER Step-by-step supplementary files. Available at <a href="https://hdl.handle.net/2022/29015">https://hdl.handle.net/2022/29015</a>.</p> |
|
|
Related Studies |
|
|
<p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Cynomolgus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/BUAU3E">https://doi.org/10.48349/ASU/BUAU3E</a>, ASU Library Research Data Repository.</p> <p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Rhesus Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/46JWEX">https://doi.org/10.48349/ASU/46JWEX</a>, ASU Library Research Data Repository.</p> <p>Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) (2022). Zelinski Lab: Japanese Macaque Ovary. Available at <a href="https://doi.org/10.48349/ASU/KM2QZQ">https://doi.org/10.48349/ASU/KM2QZQ</a>, ASU Library Research Data Repository.</p> |
|
|
Other Reference Note(s) |
|
|
Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) Web portal contains project-wide information. See https://mother-db.org |
|
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives.zip |
|
Text: |
This data set omits the “Negative” images and only contains the sub-images of annotated follicles and their augmentations. The zip file contains ~370K images based on ~7,700 annotated follicles and is about 20% as large as the complete data set described above. |
|
Notes: |
application/zip |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip |
|
Text: |
Smallest data set that omits the Negatives and Augmentations, “MOTHER_Macaque_Monkey_Preantral_Follicles_NoNegatives_NoAugmentations.zip”. This data set omits the “Negatives” and all augmentation images and only contains the sub-images of annotated follicles. The zip file contains ~7,700 images, one for each of our expert-annotated follicles. The README file contains a detailed description of the naming conventions used for the folders and sub-image file names. The filenames contain all the information needed to identify the assigned follicle type, the original histology slide it was derived from, and any augmentation details. |
|
Notes: |
application/zip |
|
Label: |
MOTHER_TrainingData_HistoSlides_AnnotTables_20250812.zip |
|
Text: |
This zip file contains paired images and follicle annotation files exported from QuPath. For example, the image file “14736_UN_050a.ome.tif” should be used with the annotations file “14736_UN_050a.annotations.txt”. Format of the histology slide files. All images in this dataset have a resolution of 0.69 micrometer/pixel and are in ome.tif format. The files range in size from 130MB to 620MB each. Format of the QuPath Files of Annotations. The annotations files were output from QuPath and have a “.txt” extension. These are tab delimited text files. |
|
Notes: |
application/zip |
|
Label: |
README_FileNamingConventions.pdf |
|
Text: |
Description of the subimage, augmentations, and file naming conventions |
|
Notes: |
application/pdf |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles_00N.zip”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles_001.zip”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.002 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.003 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.004 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.005 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |
|
Label: |
MOTHER_Macaque_Monkey_Preantral_Follicles.zip.006 |
|
Text: |
The complete data set, “MOTHER_Macaque_Monkey_Preantral_Follicles.zip.00N”: This is part of a multi-Zip archive with 6 parts. Most Zip software packages will automatically unzip all six files if you unzip the first file, "MOTHER_Macaque_Monkey_Preantral_Follicles.zip.001”. It will take ~2 hours to download this file using a stable high-speed network. Individual images are partitioned in folders by follicle type and Train, Test and Validate subfolders used for training our machine learning algorithm. In addition, various image augmentations are included such as color inversion, image rotations, etc. Each annotation of a particular follicle generates a total of 48 augmentations. The set of 48 augmentations (which includes the original) for a particular annotation will always be in the same Train, Test or Validate folder. The data set also contains an extensive set of images representing non-follicle portions of the ovary. These images can be used as counter examples to the preantral follicle classifications sets. The image filenames identify the name of the full-size histology image, the follicle type, the location of the annotation in the full-size image and information about how it was augmented. The Train, Test, and Validate partition was done randomly to give partitions of 75:20:5. If desired, these three folders can be combined and the data repartitioned. In total, the dataset contains 1.7 million images based on approximately 7,700 annotated follicles. This is a large dataset at ~120GB. You need to download the entire set of zip archives with the “.zip.00N”, when N is a digit, extensions. Zip software will reconstruct the complete zip archive if you open the first file in the series. |
|
Notes: |
application/octet-stream |