In version 11 we introduced structured data validation, the first for any crawler. For version 12, we’ve listened to user feedback and improved upon existing features, as well as introduced some exciting new ones. Let’s take a look.
1) PageSpeed Insights Integration – Lighthouse Metrics, Opportunities & CrUX Data
You’re now able to gain valuable insights about page speed during a crawl. We’ve introduced a new ‘PageSpeed’ tab and integrated the PSI API which uses Lighthouse, and allows you to pull in Chrome User Experience Report (CrUX) data and Lighthouse metrics, as well as analyse speed opportunities and diagnostics at scale.
You’re able to choose and configure over 75 metrics, opportunities and diagnostics (under ‘Config > API Access > PageSpeed Insights > Metrics’) to help analyse and make smarter decisions related to page speed.
(The irony of releasing pagespeed auditing, and then including a gif in the blog post.)
In the PageSpeed tab, you’re able to view metrics such as performance score, TTFB, first contentful paint, speed index, time to interactive, as well as total requests, page size, counts for resources and potential savings in size and time – and much, much more.
There are 19 filters for opportunities and diagnostics to help identify potential speed improvements from Lighthouse.
Click on a URL in the top window and then the ‘PageSpeed Details’ tab at the bottom, the lower window populates with metrics for that URL, and orders opportunities by those that will make the most impact at page level based upon Lighthouse savings.
By clicking on an opportunity in the lower left-hand window panel, the right-hand window panel then displays more information on the issue, such as the specific resources with potential savings.
As you would expect, all of the data can be exported in bulk via ‘Reports‘ in the top-level menu.
There’s also a very cool ‘PageSpeed Opportunities Summary’ report, which summaries all the opportunities discovered across the site, the number of URLs it affects, and the average and total potential saving in size and milliseconds to help prioritise them at scale, too.
As well as bulk exports for each opportunity, there’s a CSS coverage report which highlights how much of each CSS file is unused across a crawl and the potential savings.
Please note, using the PageSpeed Insights API (like the interface) can affect analytics currently. Google are aware of the issue and we have included an FAQ on how to set-up an exclude filter to prevent it from inflating analytics data.
2) Database Storage Crawl Auto Saving & Rapid Opening
Last year we introduced database storage mode, which allows users to choose to save all data to disk in a database rather than just keep it in RAM, which enables the SEO Spider to crawl very large websites.
Based upon user feedback, we’ve improved the experience further. In database storage mode, you no longer need to save crawls (as an .seospider file), they will automatically be saved in the database and can be accessed and opened via the ‘File > Crawls…’ top-level menu.
The ‘Crawls’ menu displays an overview of stored crawls, allows you to open them, rename, organise into project folders, duplicate, export, or delete in bulk.
The main benefit of this switch is that re-opening the database files is significantly quicker than opening an .seospider crawl file in database storage mode. You won’t need to load in .seospider files anymore, which previously could take some time for very large crawls. Database opening is significantly quicker, often instant.
You also don’t need to save anymore, crawls will automatically be committed to the database. But it does mean you will need to delete crawls you don’t want to keep from time to time (this can be done in bulk).
You can export the database crawls to share with colleagues, or if you’d prefer export as an .seospider file for anyone using memory storage mode still. You can obviously also still open .seospider files in database storage mode as well, which will take time to convert to a database (in the same way as version 11) before they are compiled and available to re-open each time almost instantly.
Export and import options are available under the ‘File’ menu in database storage mode.
To avoid accidentally wiping crawls every time you ‘clear’ or start a new crawl from an existing crawl, or close the program – the crawl is stored. This leads us nicely onto the next enhancement.
3) Resume Previously Lost or Crashed Crawls
Due to the feature above, you’re now able to resume from an otherwise ‘lost’ crawl in database storage mode.
Do any other @screamingfrog users frequently shut down their machine at the end of the day and then remember they had a monster crawl running. Twice this week 🙁
Previously if Windows had kindly decided to perform an update and restart your machine mid crawl, there was a power-cut, software crash, or you just forgot you were running a week-long crawl and switched off your machine, the crawl would sadly be lost forever.
The difference between me and AI is that AI wouldn’t be stupid enough to close a two day crawl that is 92% complete on @screamingfrog
We’ve all been there and we didn’t feel this was user error, we could do better! So if any of the above happens, you should now be able to just open it back up via the ‘File > Crawls’ menu and resume the crawl.
Unfortunately this can’t be completely guaranteed, but it will provide a very robust safety net as the crawl is always stored, and generally retrievable – even when pulling the plug directly from a machine mid-crawl.
4) Configurable Tabs
You can now select precisely what tabs are displayed and how they are ordered in the GUI. Goodbye forever meta keywords.
The tabs can be dragged and moved in order, and they can be configured via the down arrow icon to the right-hand side of the top-level tabs menu.
This only affects how they are displayed in the GUI, not whether the data is stored. However…
5) Configurable Page Elements
You can de-select specific page elements from being crawled and stored completely to help save memory. These options are available under ‘Config > Spider > Extraction’. For example, if you wanted to stop storing meta keywords the configuration could be disabled.
This allows users to run a ‘bare bones’ crawl when required.
6) Configurable Link Elements For Focused Auditing In List Mode
You can now choose whether to specifically store and crawl link elements as well (under ‘Config > Spider > Crawl’).
This enables the SEO Spider to be infinitely more flexible, particularly with the new configurable ‘Internal hyperlinks’ configuration option. This becomes really powerful in list mode in particular, which might not be immediately clear why at face value.
However, if you deselect ‘Crawl’ and ‘Store’ options for all ‘Resource Links’ and ‘Page Links’, switch to list mode, go to ‘Config > Spider > Limits’ and remove the crawl depth that gets applied in list mode, you can then choose to audit any link element you wish alongside the URLs you upload.
For example, you can supply a list of URLs in list mode, and only crawl them and their hreflang links only.
Or you could supply a list of URLs and audit their AMP versions only. You could upload a list of URLs, and just audit the images on them. You could upload a list of URLs and only crawl the external links on them for broken link building. You get the picture.
Previously this level of control and focus just wasn’t available, as removing the crawl depth in list mode would mean internal links would also be crawled.
This advanced configurability allows for laser focused auditing of precisely the link elements you require saving time and effort.
7) More Extractors!
You wanted more extractors, so you can now configure up to 100 in custom extraction. Just click ‘Add’ each time you need another extractor.
Custom extraction also now has its own tab for more granular filtering, which leads us onto the next point.
8) Custom Search Improvements
Custom Search has been separated from extraction into its own tab, and you can now have up to 100 search filters.
A dedicated tab allows the SEO Spider to display all filter data together, so you can combine filters and export combined.
We have more plans for this feature in the future, too.
9) Redirect Chain Report Improvements
Based upon user feedback, we’ve split up the ‘Redirect & Canonical Chains’ report into three.
You can now choose to export ‘All Redirects’ (1:1 redirects and chains together), ‘Redirect Chains’ (just redirects with 2+ redirects) and ‘Redirect & Canonical Chains’ (2+ redirects, or canonicals in a chain).
All of these will work in list mode when auditing redirects. This should cover different scenarios when a variety of data combined or separated can be useful.
Version 12.0 also includes a number of smaller updates and bug fixes, outlined below.
- There’s a new ‘Link Attributes’ column for inlinks and outlinks. This will detail whether a link has a nofollow, sponsored or ugc value. ‘Follow Internal Nofollow‘ and ‘Follow External Nofollow‘ configuration options will apply to links which have sponsored or ugc, similar to a normal nofollow link.
- The SEO Spider will pick up the new max-snippet, max-video-preview and max-image-preview directives and there are filters for these within the ‘Directives‘ tab. We plan to add support for data-nosnippet at a later date, however this can be analysed using custom extraction for now.
- We’re committed to making the tool as reliable as possible and encouraging user reporting. So we’ve introduced in-app crash reporting, so you don’t even need to bring up your own email client or download the logs manually to send them to us. Our support team may get back to you if we require more information.
- The crawl name is now displayed in the title bar of the application. If you haven’t named the crawl (or saved a name for the .seospider crawl file), then we will use a smart name based upon your crawl. This should help when comparing two crawls in separate windows.
- Structured data validation has been updated to use Schema.org 3.9 and now supports FAQ, How To, Job Training and Movie Google features. We’ve also updated nearly a dozen features with changing required and recommended properties.
- ga:users metric has now been added to the Google Analytics integration.
- ‘Download XML Sitemap’ and ‘Download XML Sitemap Index’ options in list mode, have been combined into a single ‘Download XML Sitemap’ option.
- The exclude configuration now applies when in list mode, and to robots.txt files.
- Scroll bars have now been removed from rendered page screenshots.
- Our SERP snippet emulator has been updated with Google’s latest changes to larger font on desktop, which has resulted in less characters being displayed before truncation in the SERPs. The ‘Over 65 Characters’ default filter for page titles has been amended to 60. This can of course be adjusted under ‘Config > Preferences’.
- We’ve significantly sped up robots.txt parsing.
- Custom extraction has been improved to use less memory.
- We’ve added support for x-gzip content encoding, and content type ‘application/gzip’ for sitemap crawling.
- We removed the descriptive export name text from the first row of all exports as it was annoying.
That’s everything. If you experience any problems with the new version, then please do just let us know via our support and we’ll help as quickly as possible.
Thank you to everyone for all their feature requests, feedback, and bug reports. We appreciate each and every one of them.
Now, go and download version 12.0 of the Screaming Frog SEO Spider and let us know what you think!
Small Update – Version 12.1 Released 25th October 2019
We have just released a small update to version 12.1 of the SEO Spider. This release is mainly bug fixes and small improvements –
- Fix bug preventing saving of .seospider files when PSI is enabled.
- Fix crash in database mode when crawling URLs with more than 2,000 characters.
- Fix issue with Majestic with not requesting data after a clear/pause.
- Fix ‘inlinks’ tab flickering during crawl if a URL is selected.
- Fix crash re-spidering a URL.
- Fix crash editing text input fields with special characters.
- Fix crash when renaming a crawl in database mode.
if you want to read: What is SEO