SEC's cloud-based data science workstations
- By Sara Friedman
In its mission to protect investors and ensure market integrity, the Securities and Exchange Commission analyzes up to 30 billion records a day. Historically, the SEC didn’t have the tools to process and manage all of this data.
At first, a small group of financial analysts used specialized workstations "to help increase their processing speed for very large datasets,” SEC CIO Pamela Dyson told GCN. “In that journey, we saw that we had to constantly upgrade the processing speeds and our storage needs. And these server-class workstations were expensive to maintain since we had to build redundancies into the workstations.”
In January 2013, the SEC rolled out its Market Information Data Analytics System, a cloud-based analytics platform in the Amazon Web Services public cloud that allowed it to store and analyze billions of market exchange records in seconds and scale processing power as needed.
In May, the SEC launched a new data science workstation platform in the AWS cloud to give quantitative analysts with better performance, latency and access to the latest financial analytics tools. Dyson first announced the workstations program during a June 20 keynote presentation at the AWS Public Sector Summit.
“One of the biggest advantages that we saw in planning the data science workstation program is the ability to use the cloud to turn things off when we don’t need them,” said Laura Kurup, senior analyst for data strategy in the SEC’s Office of the CIO. “Quantitative analysts [may] need more computing power for a particular day or week, and in the cloud we can turn on those resources and scale back to see cost savings when they are not needed.”
The Office of Compliance, Inspections and Examinations uses a combination of open source tools to help analysts with unstructured data, time series data, data visualizations and machine learning. This helps analysts determine what datasets should be examined first.
“Machine learning helps us to look at activities with areas of risk so we can focus our research,” Kurup said. “It doesn’t replace the activities of the SEC staff, but it can supplement and save time in deciding where to focus their efforts.”
Part of the rationale for creating the data science workstations program was to bring more skilled analysts to the SEC from the private sector where they have access to state of the art systems and tools, Dyson said.
“A lot of the quantitative analysts that come into the SEC have developed their own tools, but we want to be flexible here too,” Dyson said. “We are trying to establish a collaborative environment so we can share same tools across our 20 groups of analysts.”
Dyson is also interested in bringing the power of the data science workstations program to other parts of the agency such as the Office of Financial Management and departments that do economic analysis for SEC rulemaking.
Based on the guidance of a cloud governance team, Dyson said the SEC wants to migrate about 25 percent of its applications to the cloud within the next 18 months.
Editor's note: This article was changed July 18. The 30 billion data exchanges mentioned mistakenly referred to the workload at the SEC as a whole, not the workload of the Office of Compliance, Inspections and Examinations as originally stated. SEC staff has historically had tools for analysis, but it needed better ones to meet increasing demands.
Sara Friedman is a reporter/producer for GCN, covering cloud, cybersecurity and a wide range of other public-sector IT topics.
Before joining GCN, Friedman was a reporter for Gambling Compliance, where she covered state issues related to casinos, lotteries and fantasy sports. She has also written for Communications Daily and Washington Internet Daily on state telecom and cloud computing. Friedman is a graduate of Ithaca College, where she studied journalism, politics and international communications.
Friedman can be contacted at firstname.lastname@example.org or follow her on Twitter @SaraEFriedman.
Click here for previous articles by Friedman.