October | 2020 | Pontezuela Tech

Delivery Platforms Categorization Providers Cluster Medical Devices

30/10/2020
/
Web Master
/
News
/
There are no comments

The true promise of synthetic data

27/10/2020 27/10/2020
/
Web Master
/
News
/
There are no comments

Researchers at MIT launch the Synthetic Data Vault, a set of open source tools aimed at expanding access to data without compromising privacy.

Every year, the world generates more data than the previous year. In 2020 alone, an estimated 59 zettabytes of data will be “created, captured, copied and consumed,” according to the International Data Corporation, enough to fill about a trillion 64-gigabyte hard drives.

But just because data is proliferating doesn’t mean everyone can use it. Companies and institutions, legitimately concerned about the privacy of their users, often restrict access to data sets – sometimes within their own computers. And now that the Covid-19 pandemic has closed laboratories and offices, preventing people from visiting centralized data warehouses, sharing information securely is even more difficult.

Without access to data, it is difficult to create tools that really work. Introduce synthetic data: Artificial information developers and engineers can use it as a substitute for real data.

Synthetic data is a bit like diet soda. To be effective, they have to look like “real” ones in certain ways. Diet soda should look, taste, and bubble like regular soda. Similarly, a synthetic data set must have the same mathematical and statistical properties as the real-world data set it represents. “It looks alike, and has a similar format,” says Kalyan Veeramachaneni, principal investigator at the Data Laboratory for AI (DAI) and principal investigator scientist at the MIT Decision and Information Systems Laboratory. If run through a model, or used to build or test an application, it performs as real-world data would.

But – just as diet soda must have fewer calories than the regular variety – a synthetic data set must also differ from a real one in crucial ways. If it is based on an actual dataset, for example, it should not contain or even imply any information from that dataset.

Threading this needle is difficult. After years of work, Veeramachaneni and his collaborators recently unveiled a set of open source data generation tools – a one-stop shop where users can get as much data as they need for their projects, in formats from tables to time series. They call it the Synthetic Data Vault.

Maximize access while maintaining privacy
Veeramachaneni and his team first attempted to create synthetic data in 2013. They had been tasked with analyzing a large amount of information from the edX online learning program, and they wanted to bring in some MIT students to help. The data was sensitive, and couldn’t be shared with these new hires, so the team decided to create artificial data that students could work with instead, thinking that “once they wrote the processing software, we could use it in the real data, “says Veeramachaneni.

This is a common scenario. Imagine that you are a software developer hired by a hospital. You have been asked to build a dashboard that allows patients to access their test results, prescriptions, and other health information. But you are not allowed to see any actual patient data, because it is private.

Most developers in this situation will make “a very simplistic version” of the data they need, and they will do the best they can, says Carles Sala, a researcher at the AID lab. But when the board kicks in, there’s a good chance that “everything will fall off,” he says, “because there are some borderline cases that they weren’t taking into account.”

High-quality synthetic data, as complex as what it is intended to replace, would help solve this problem. Companies and institutions could share them freely, allowing teams to work more collaboratively and efficiently. Developers could even carry it on their laptops, knowing they weren’t putting any sensitive information at risk.

Refine the formula – and handle the constraints
In 2013, the Veeramachaneni team took two weeks to create a data pool that they could use for that edX project. The timeline “seemed really reasonable,” says Veeramachaneni. “But we completely failed.” They soon realized that if they built a series of synthetic data generators, they could make the process faster for everyone else.

In 2016, the team completed an algorithm that accurately captures correlations between different fields in a real data set – think about a patient’s age, blood pressure, and heart rate – and creates a synthetic data set that preserves those relationships, without any identifying information. When data scientists were asked to solve problems using this synthetic data, their solutions were as effective as those made with real data 70 percent of the time. The team presented this research at the IEEE International Conference on Data Science and Advanced Analytics in 2016.

For the next step, the team delved into the machine learning toolbox. In 2019, PhD student Lei Xu presented her new algorithm, CTGAN, at the 33rd Conference on Neural Information Processing Systems in Vancouver. CTGAN (for “Conditional Tabular Generative Adversary Networks”) uses GANs to build and refine tables of synthetic data. GANs are pairs of neural networks that “play with each other,” says Xu. The first network, called the generator, creates something – in this case, a row of synthetic data – and the second, called the discriminator, tries to tell whether it is real or not.

“Eventually, the generator can generate perfect [data], and the discriminator can’t tell the difference,” Xu says. GANs are most often used in artificial imaging, but they also work well for synthetic data: CTGAN outperformed classic synthetic data creation techniques in 85 percent of the cases tested in Xu’s study.

Statistical similarity is crucial. But depending on what they represent, the data sets also come with their own context and vital limitations, which must be preserved in the synthetic data. AID lab researcher Sala gives the example of a hotel ledger: a guest always leaves after checking in. Dates in a synthetic hotel reservation dataset should also follow this rule: “They have to be in the correct order,” he says.

Large data sets can contain a number of different relationships like this, each strictly defined. “Models cannot learn constraints, because constraints are very context-dependent,” says Veeramachaneni. So the team recently finished an interface that allows people to tell a synthetic data generator where those limits are. “The data is generated within those limitations,” says Veeramachaneni.

That accurate data could help businesses and organizations in many different industries. One example is banking, where increased digitization, coupled with new data privacy regulations, has “sparked growing interest in ways to generate synthetic data,” says Wim Blommaert, team leader for financial services at the ING. Current solutions, such as data masking, often destroy valuable information that banks could use to make decisions, he said. A tool like SDV has the potential to sidestep the sensitive aspects of data while preserving these important limitations and relationships.

One vault to rule them all
The Synthetic Data Vault combines everything the group has built so far into “a whole ecosystem,” says Veeramachaneni. The idea is that stakeholders – from students to professional software developers – can come to the vault and get what they need, be it a large table, a small amount of time series data, or a mix of many different types of data.

The vault is open source and expandable. “There are a lot of different areas where we find that synthetic data can also be used,” says Sala. For example, if a particular group is underrepresented in a sample data set, synthetic data can be used to fill in those gaps – a sensible endeavor that requires a lot of finesse. Or companies may also want to use the synthetic data to plan for scenarios they have not yet experienced, such as a large increase in user traffic.

As use cases keep popping up, more tools will be developed and added to the vault, Veeramachaneni says. It may occupy the team for another seven years at least, but they are ready: “We are only touching the tip of the iceberg.”

Robotic interviews, machine learning and the future of the recruitment of the workforce.

27/10/2020 27/10/2020
/
Web Master
/
News
/
There are no comments

This would affect all aspects of HR functions like how HR professionals embark and hire people, and the way they train them.

Artificial intelligence (AI) is changing every aspect of our lives and that too at a rapid pace. This includes our professional lives as well. Experts hope that in the coming days, AI will become a more important part of our careers as all companies are making progress in adopting such technology. They are using more machines that use AI technology that would affect our daily professional activities. Very soon, we would see machine learning and deep learning in HR as well. It would affect all aspects of HR (human resources) such as the way HR professionals embark and hire people, and the way they train them.

Impact on onboarding and recruiting

Companies are also using machine learning and deep learning in HR to help provide on-the-job training to employees. Just because you’ve gotten a job and settled into it doesn’t mean you know everything. You need to get job-related training so you can keep improving. This is where experts expect AI to play a major role in the years to come. It will also help a generation of professionals in an organization transfer their skills to their successors. This will ensure that no company ever suffers from a skill shortage. Increase in the workforce Robotics in human resources will play an important role in improving the people who work in organizations in which the management applies that technology. One of the main reasons people are so afraid of using AI in an organization is that they feel that it would replace them and that they would do everything they can do now. This will consequently lead to job losses. However, in today’s scenario, AI is all about augmenting that workforce. This means that it would help you do your job more efficiently. Contrary to popular opinion, she would not replace you.

Surveillance of the workplace

Companies can also use machine learning and deep learning in HR to improve their workforce surveillance work. This is uncomfortable for many employees as they feel that such technology would invade the privacy of their workplace. Recently, Gartner conducted a survey that found that more than half of companies with an annual turnover of more than $ 750 million use digital tools to obtain data on their employees’ activities and monitor their overall performance. As part of this, they analyze their emails to find out how engaged and happy they are with their work.

The use of workplace robots

Aside from robotics in HR, companies today also use physical robots that can move by themselves. This is especially true for warehousing and manufacturing companies. Experts hope that this will soon become a common feature in many other workplaces as well. Mobility companies are creating delivery robots that can move around the workplace and deliver items directly to your desk. Tech companies are also developing security robots. Experts believe they would become commonplace because they can ensure the security of commercial properties against intruders. Companies are also developing software to help you park your cars in your office.

JA Dominicana Project – SPN Software

20/10/2020 20/10/2020
/
Web Master
/
News
/
There are no comments

Month: October 2020

Delivery Platforms Categorization Providers Cluster Medical Devices

The true promise of synthetic data

Robotic interviews, machine learning and the future of the recruitment of the workforce.

JA Dominicana Project – SPN Software

Recent Posts

Archives

Categories

Month: October 2020

Delivery Platforms Categorization Providers Cluster Medical Devices

The true promise of synthetic data

Robotic interviews, machine learning and the future of the recruitment of the workforce.

JA Dominicana Project – SPN Software

Recent Posts

Archives

Categories

Tags