TDM Studio, by ProQuest, is a text and data mining solution for research, teaching and learning.
Proficiency in R or Python programming languages is useful but not necessary for text and data mining with TDM Studio.
Anyone with a valid UPenn email address can access a workbench by logging in. By default, each workbench can support 1-5 users.
Go to ProQuest TDM Studio (https://tdmstudio.proquest.com/home)
Select “Create Account” in the top right corner
After entering your Upenn email address, the system should automatically select “University of Pennsylvania Libraries” as your institution. Create your password, read and accept ProQuest TDM Studio’s Private Policy and Terms and Conditions, and click “Create Account”
A confirmation email should be sent to your Upenn email inbox, continue by clicking on “Verify Email and Log In”. Your email address has now been verified. Click on “Log in to TDM Studio”, and now you are ready to log in by entering your email address and password
You can create your own dataset by first generating a new Workbench Dashboard at https://tdmstudio.proquest.com/workbenchdashboard.
From the Workbench, select the “Create New Dataset” button to get started. Then, You will be able to select two options from a drop-down menu. Please note that:
Select “Publication Titles “allows you to limit your search to individual publication titles such as the New York Times or The Washington Post.
Select “ProQuest Databases” allows you to limit your search to individual ProQuest databases.
If there is a specific title you would like to include in your dataset, please use the search box in the upper right-hand corner to filter those names.
ProQuest has some tips to help you select the correct content.
Sometimes, there will be multiple entries for the same publication title. The “Source Type” column will help you select the right one.
You can use the “Full Text” column on the far right to determine whether your selected publication contains full text or not.
Make sure that the publications that you select cover the period that you want. Some publications are split between historical and current versions, so it may be necessary to select different or multiple publications depending on the time span you want to be covered.
If you are selecting multiple publications of the same name (their current and historical versions), try to generate your dataset starting from the most recent publication and going back chronologically.
ProQuest offers an online module “What content is available?” that discusses more about content selection.
There is no maximum number of publications that you can include in the dataset you create, but you can monitor the number of selected publications at the bottom of the page. Once you have selected all the publications, click the “Next: Refine Content” button to proceed to the next step.
Refining your results is an important step since a dataset created by ProQuest TDM Studio can only contain up to 2 million records.
You can use Boolean expressions (such as and, not, or, etc.) in the search box to control the search results. You can also refine by date published, source type, and document type.
ProQuest offers the module “Best practices on searching ProQuest content by using search mnemonics and search tips” that covers more about results refinement.
When you are satisfied with the dataset that you have created, you can start the process to create your dataset by clicking the “Next: Review Dataset” button on the bottom right.
Then, you will need to name the dataset before creating it. You can also add any description that will help you later identify this dataset in your workbench once it is processed. Then, click on
“Create Dataset” on the bottom. You should be able to see a pop-up window confirming that your dataset is now being created.
Please note that the dataset does take some time to process. The processing time may take an hour to an entire day, depending on the size of the dataset, it can take an hour or just under a day.
Closing this pop-up window will bring you back to the dashboard where you can see the dataset being queued for processing. Check back in a few hours to see if your dataset is ready!
Analyzing and Visualizing Text with Constellate and ProQuest TDM Studio: A guide introducing text analysis as a research method and a demonstration of Constellate and ProQuest TDM Studio. The guide contains slides, a recorded workshop, and instructions for using the platforms.