Print This Post Print This Post 178,134 views
Oct 18

Reported by Caroline Perry, Harvard University, 16 Oct. 2013.

Computer scientists at Harvard and cognitive scientists at MIT team up to settle a debate over “chart junk”.

Which of these visualizations will you remember later? (Images courtesy of Michelle Borkin, Harvard SEAS.)

It’s easy to spot a “bad” data visualization—one packed with too much text, excessive ornamentation, gaudy colors, and clip art. Design guru Edward Tufte derided such decorations as redundant at best, useless at worst, labeling them “chart junk.” Yet a debate still rages among visualization experts: Can these reviled extra elements serve a purpose?

Taking a scientific approach to design, researchers from Harvard University and Massachusetts Institute of Technology are offering a new take on that debate. The same design elements that attract so much criticism, they report, can also make a visualization more memorable.

Detailed results were presented this week at the IEEE Information Visualization (InfoVis) conference in Atlanta, hosted by the Institute of Electrical and Electronics Engineers. Continue reading »

Tagged with:
Print This Post Print This Post 1,503 views
Oct 16

Reported by Jennifer Ouellette, in Quanta Magazine of Simons Foundation, 9 Oct. 2013.

Tang Yau Hoong

The nature of computing has changed dramatically over the last decade, and more innovation is needed to weather the gathering data storm.

When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second. After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance. Continue reading »

Tagged with:
Print This Post Print This Post 1,482 views
Apr 14

Reported by Universidad Carlos III de Madrid – Oficina de Información Científica, 11 April 2011.

Artificial Intelligence offers many possibilities for developing data processing systems which are more precise and robust. That is one of the main conclusions drawn from an international encounter of experts in this scientific area, recently held at Universidad Carlos III de Madrid (UC3M).

Artificial Intelligence offers many possibilities for developing data processing systems which are more precise and robust. (Credit: UMC3)

Within this framework, five leading scientists presented the latest advances in their research work on different aspects of AI. The speakers tackled issues ranging from the more theoretical such as algorithms capable of solving combinatorial problems to robots that can reason about emotions, systems that use vision to monitor activities, and automated players that learn how to win in a given situation. “Inviting speakers from groups of references allows us to offer a panoramic view of the main problems and the techniques open in the area, including advances in video and multi-sensor systems, task planning, automated learning, games, and artificial consciousness or reasoning,” the experts noted.

The participants from the AVIRES (The Artificial Vision and Real Time Systems) research group at the University of Udine gave a seminar on the introduction of data fusion techniques and distributed artificial vision. In particular, they dealt with automated surveillance systems with visual sensor networks, from basic techniques for image processing and object recognition to Bayesian reasoning for understanding activities and automated learning and data fusion to make high performance system. Dr.Simon Lucas, professor at the Essex University and editor in chief of IEEE Transactions on Computational Intelligence and AI in Games and a researcher focusing on the application of AI techniques on games, presented the latest trends in generation algorithms for game strategies. During his presentation, he pointed out the strength of UC3M in this area, citing its victory in two of the competitions held at the international level during the most recent edition of the Conference on Computational Intelligence and Games.

In addition, Enrico Giunchiglia, professor at the University of Genoa and former president of the Council of the International Conference on Automated Planning and Scheduling (ICAPS), described the most recent work in the area of logic satisfaction, which is rapidly growing due to its applications in circuit design and in task planning

Artificial Intelligence (IA) is as old as computer science and has generated ideas, techniques and applications that permit it to solve difficult problems. The field is very active and offers solutions to very diverse sectors. The number of industrial applications that have an AI technique is very high, and from the scientific point of view, there are many specialized journals and congresses. Furthermore, new lines of research are constantly being open and there is a still great room for improvement in knowledge transfer between researchers and industry. These are some of the main ideas gathered at the 4th International Seminar on New Issues on Artificial Intelligence), organized by the SCALAB group in the UC3M Computer Engineering Department at the Leganés campus of this Madrid university.

The future of Artificial Intelligence

This seminar also included a talk on the promising future of AI. “The tremendous surge in the number of devices capable of capturing and processing information, together with the growth of the computing capacity and the advances in algorithms enormously boost the possibilities for practical application,” the researchers from the SCALAB group pointed out. Among them we can cite the construction of computer programs that make life easier, which take decisions in complex environments or which allow problems to be solved in environments which are difficult to access for people,” he noted. From the point of view of these research trends, more and more emphasis is being placed on developing systems capable of learning and demonstrating intelligent behavior without being tied to replicating a human model.

AI will allow advances in the development of systems capable of automatically understanding a situation and its context with the use of sensor data and information systems as well as establishing plans of action, from support applications to decision making within dynamic situations. According to the researchers, this is due to the rapid advances and the availability of sensor technology which provides a continuous flow of data about the environment, information that must be dealt with appropriately in a node of data fusion and information. Likewise, the development of sophisticated techniques for task planning allow plans of action to be composed, executed, checked for correct execution, and rectified in case of some failure, and finally to learn from mistakes made.

This technology has allowed a wide range of applications such as integrated systems for surveillance, monitoring and detecting anomalies, activity recognition, teleassistence systems, transport logistic planning, etc. According to Antonio Chella, Full Professor at the University of Palermo and expert in Artificial Consciousness, the future of AI will imply discovering a new meaning of the word “intelligence.” Until now, it has been equated with automated reasoning in software systems, but in the future AI will tackle more daring concepts such as the incarnation of intelligence in robots, as well as emotions, and above all consciousness.

Tagged with:
Print This Post Print This Post 3,547 views
Feb 28

Reported by Josh Fischman, in The Chronicle of Higher Education, 10 February 2011.

Scientists are wasting much of the data they are creating. Worldwide computing capacity grew at 58 percent every year from 1986 to 2007, and people sent almost two quadrillion megabytes of data to one another, according to a study published on Thursday in Science. But scientists are losing a lot of the data, say researchers in a wide range of disciplines.

In 10 new articles, also published in Science, researchers in fields as diverse as paleontology and neuroscience say the lack of data libraries, insufficient support from federal research agencies, and the lack of academic credit for sharing data sets have created a situation in which money is wasted and information that could reveal better cancer treatments or the causes of climate change goes by the wayside.

“Everyone bears a certain amount of responsibility and blame for this situation,” said Timothy B. Rowe, a professor of geological sciences at the University of Texas at Austin, who wrote one of the articles.

A big problem is the many forms of data and the difficulty of comparing them. In neuroscience, for instance, researchers collect data on scales of time that range from nanoseconds, if they are looking at rates of neuron firing, to years, if they are looking at developmental changes. There are also difference in the kind of data that come from optical microscopes and those that come from electron microscopes, and data on a cellular scale and data from a whole organism.

“I have struggled to cope with this diversity of data,” said David C. Van Essen, chair of the department of anatomy and neurobiology at the Washington University School of Medicine, in St. Louis. Mr. Van Essen co-authored the Science article on the challenges data present to brain scientists. “For atmospheric scientists, they have one earth. We have billions of individual brains. How do we represent that? It’s precisely this diversity that we want to explore.”

He added that he was limited by how data are published. “When I see a figure in a paper, it’s just the tip of the iceberg to me. I want to see it in a different form in order to do a different kind of analysis.” But the data are not available in a public, searchable format.

Ecologists also struggle with data diversity. “Some measurements, like temperature, can be taken in many places and in many ways, ” said O.J. Reichman, a researcher at the National Center for Ecological Analysis and Synthesis, at the University of California at Santa Barbara. “It can be done with a thermometer, and also by how fast an organ grows in a crayfish” because growth is temperature-sensitive, said Mr. Reichman, a co-author of another of the Science articles.

A Big Success Story

The situation criticized in the Science articles contrasts with the big success story in scientific data libraries, GenBank, the gene-sequence repository, said Mr. Reichman and several other scientists. GenBank created a common format for data storage and made it easy for researchers to access it. But Mr. Reichman added that GenBank did not have to deal with the diversity issue.

“GenBank basically had four molecules in different arrangements,” he said. “We have way more than four things in ecology,” he continued, echoing Mr. Van Essen’s lament.

But even gene scientists today say they are struggling with the many permutations of those four molecules. In another Science article, Scott D. Kahn, chief information officer at Illumina, a leading maker of DNA-analysis equipment, notes that output from a single gene-sequencing machine has grown from 10 megabytes to 10 gigabytes per day, and 10 to 20 major labs now use 10 of those machines each. One solution being contemplated, he writes, is to store just one copy of a standard “reference genome” plus mutations that differ from the standard. That amounts to only 0.1 percent of the available data, possibly making it easier for researchers to store the information and analyze it.

To cope with data diversity, Mr. Reichman said scientists should develop a common language for tagging their data. “If you record data from a particular location, the tags about that location—latitude and longitude, for instance—need to be consistent from researcher to researcher,” he said. Ecology has grown into a relatively idiosyncratic science, and all researchers have their own methods, so a common language will require a culture shift. “It’s become more urgent to do this because of the pressing environmental questions, like the effects of climate change, that we are being called on to answer,” he said. “And the ability to access more than one set of measurement or interactions will make the science better.”

Another factor that makes developing shared-data libraries urgent is that many scientists now store their own data. “And when they retire or die, their data goes with them,” said Mr. Rowe. In his field, using three-dimensional-imaging machines like CT scanners to analyze fossils, the first people to do that have already left the field, so there has already been a tremendous loss of data.

There is a financial cost to this, he added. “It costs money to do a CT scan, and the National Science Foundation pays for that with a grant. But if that scan isn’t curated, and disappears when the scientist retires or forgets about it, then the next scientist asks the NSF for money to do it again. That’s just a waste,” he said.

In all of the papers, scientists cited examples of small libraries of shared data that could be scaled up. Mr. Rowe helped to develop a project called DigiMorph, which contains three-dimensional scans of about 1,000 biological specimens and fossils. Those data sets have been viewed by about 80,000 visitors, he said, and have been used in 100 scientific papers. Sharing the data, he said, brings the cost to researchers, and their grant-giving agencies, way down. Another project, the Neuroscience Information Framework, contains many more data sets and has been used by even more scientists.

Mr. Rowe thinks agencies like the NSF and the National Institutes of Health should get behind efforts like this to a much greater extent than they have done. “Right now they are financing data generation, but not the release of that data, or the ability of other scientists to analyze it. I think, with all respect, that they are really missing the boat.”

Tagged with:
preload preload preload